Processing devices with improved addressing capabilties systems and methods

ABSTRACT

A data processing device comprising a storage circuit accessible by assertion of addresses, an arithmetic logic unit connected to the storage circuit operative to perform an arithmetic operation on data received by the arithmetic unit. Further included is an address register for storing an initial address word indicative of a storage circuit address. An instruction decode and control unit, connected to the storage circuit and having an instruction register operative to hold a program instruction is operative to decode the program instruction into control signals to control the operations of the data processing device and location codes to control data transfers according to predetermined sections of the program instruction wherein at least one of the sections includes a location section selecting the address register and a displacement section containing address data. Further included is an address generating unit connected to the storage circuit, the instruction register, and the address register responsive to the control signals from the instruction decode and control unit combining the initial address word from the address register and the address data from the displacement section to generate a storage circuit address. Other devices, systems and methods are also disclosed.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application is related to coassigned applications Ser. No.______ (TI-14044), Ser. No. ______ (TI-14610), Ser. No. ______(TI-15600) and Ser. No. ______ (TI-14612) filed contemporaneouslyherewith and incorporated herein by reference. In addition, theapplicants hereby incorporate by reference the following co-assignedpatent documents.

[0002] a) U.S. Pat. No. 4,713,748 (TI Docket 10731)

[0003] b) U.S. Pat. No. 4,577,282 (TI Docket 9062)

[0004] c) U.S. Pat. No. 4,912,636 (TI Docket 11961)

[0005] d) U.S. Pat. No. 4,878,190 (TI Docket 113241)

[0006] e) U.S. application Ser. No. 347,967 filed May 4, 1989 (TI Docket14145)

[0007] f) U.S. application Ser. No. 388,270 filed Jul. 31, 1989 (TIDocket 14141)

[0008] g) U.S. application Ser. No. 421,500 filed Oct. 13, 1989 (TIDocket 14205)

NOTICE

[0009] (C) Copyright 1989 Texas Instruments Incorporated. A portion ofthe disclosure of this patent document contains material which issubject to copyright protection. The copyright owner has no objection tothe facsismle reproduction by anyone of the patent disclosure, as itappears in the Patent and Trademark Office patent file or records, butotherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

[0010] 1. Field of the Invention

[0011] This invention generally relates to data processing devices,systems and methods and more particularly to communication between suchdevices, systems and methods.

[0012] 2. Background Art

[0013] A microprocessor device is a central processing unit or CPU for adigital processor which is usually contained in a single semiconductorintegrated circuit or “chip” fabricated by MOS/LSI technology, as shownin U.S. Pat. No. 3,757,306 issued to Gary W. Boone and assigned to TexasInstruments Incorporated. The Boone patent shows a single-chip 8-bit CPUincluding a parallel ALU, registers for data and addresses, aninstruction register and a control decoder, all interconnected using thevon Neumann architecture and employing a bidirectional parallel bus fordata, address and instructions. U.S. Pat. No. 4,074,351, issued to GaryW. Boone, and Michael J. Cochran, assigned to Texas InstrumentsIncorporated, shows a single-chip “microcomputer” type device whichcontains a 4-bit parallel ALU and its control circuitry, with on-chipROM for program storage and on-chip RAM for data storage, constructed inthe Harvard architecture. The term microprocessor usually refers to adevice employing external memory for program and data storage, while theterm microcomputer refers to a device with on-chip ROM and RAM forprogram and data storage. In describing the instant invention, the term“microcomputer” will be used to include both types of devices, and theterm “microprocessor” will be primarily used to refer to microcomputerswithout on-chip ROM; both terms shall be used since the terms are oftenused interchangeably in the art.

[0014] Modern microcomputers can be grouped into two general classes,namely general-purpose microprocessors and special-purposemicrocomputers and microprocessors. General purpose microprocessors,such as the M68020 manufactured by Motorola, Inc., are designed to beprogrammable by the user to perform any of a wide range of tasks, andare therefore often used as the central processing unit in equipmentsuch as personal computers. Such general-purpose microprocessors, whilehaving good performance for a wide range of arithmetic and logicalfunctions, are of course not specifically designed for or adapted to anyparticular one of such functions. In contrast, special-purposemicrocomputers are designed to provide performance improvement forspecific predetermined arithmetic and logical functions for which theuser intends to use the microcomputer. By knowing the primary functionof the microcomputer, the designer can structure the microcomputer insuch a manner that the performance of the specific function by thespecial-purpose microcomputer greatly exceeds the performance of thesame function by the general-purpose microprocessor regardless of theprogram created by the user.

[0015] One such function which can be performed by a special-purposemicrocomputer at a greatly improved rate is digital signal processing,specifically the computations required for the implementation of digitalfilters and for performing Fast Fourier Transforms. Because suchcomputations consist to a large degree of repetitive operations such asinteger multiply, multiple-bit shift, and multiply-and-add, aspecial-purpose microcomputer can be constructed specifically adapted tothese repetitive functions. Such a special-purpose microcomputer isdescribed in U.S. Pat. No. 4,577,282, assigned to Texas InstrumentsIncorporated. The specific design of a microcomputer for thesecomputations has resulted in sufficient performance improvement overgeneral purpose microprocessors to allow the use of such special-purposemicrocomputers in real-time applications, such as speech and imageprocessing.

[0016] The increasing demands of technology and the marketplace makedesirable even further structural and process improvements in processingdevices, systems and methods of operation. These demands have lead toincreasing the performance of single-chip devices and single systems asstate-of-the-art silicon processing technologies allow. However, someperformance-hungry applications such as video conferencing, 3D graphicsand neural networks require performance levels over and above that whichcan be achieved with a single device or system. Many such applicationsbenefit from parallel processing.

[0017] However, performance gains from parallel processing are improvedwhen communication overhead between processors is minimized. Thus,improvements are desirable which enhance interprocessor communications,and thus software and system development.

SUMMARY OF THE INVENTION

[0018] In general, the summary of the invention is a data processingdevice comprising a storage circuit accessible by assertion ofaddresses, an arithmetic logic unit connected to the storage circuit,operative to perform an arithmetic operation on data received by thearithmetic unit. Further included is an address register for storing aninitial address word indicative of a storage circuit address. Aninstruction decode and control unit, connected to the storage circuitand having an instruction register operative to hold a programinstruction is operative to decode the program instruction into controlsignals to control the operations of the data processing device andlocation codes to control data transfers according to predeterminedsections of the program instruction wherein at least one of the sectionsincludes a location section selecting the address register and adisplacement section containing address data. Further included is anaddress generating unit connected to the storage circuit, theinstruction register, and the address register responsive to the controlsignals from the instruction decode and control unit combining theinitial address word from the address register and the address data fromthe displacement section to generate a storage circuit address.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] The novel features believed characteristic of the invention areset forth in the appended claims. The preferred embodiments of theinvention as well as other features and advantages thereof will be bestunderstood by reference to the detailed description which follows, readin conjunction with the accompanying drawings, wherein:

[0020]FIG. 1 is an electrical diagram, in block form, of a microcomputerconstructed according to the invention.

[0021]FIG. 1a is a block diagram illustrating control registers of theCPU of the microcomputer of FIG. 1.

[0022]FIG. 2a is an electrical diagram, in block form, of thecommunication port of the microcomputer of FIG. 1 interfaced to ananalog to digital converter.

[0023]FIG. 2b is an electrical diagram, in block form, of thecommunication port of the microcomputer of FIG. 1 interfaced to a dataprocessing device via an interface module.

[0024]FIG. 3 is a diagram illustrating four instruction formats of themicrocomputer of FIG. 1.

[0025]FIG. 4 is an electrical diagram, in block form, of the data flowwhich occurs when invoking the four instruction formats illustrated inFIG. 3.

[0026]FIG. 5a is an electrical diagram, in block form, of the peripheralports of the microcomputer of FIG. 1.

[0027]FIG. 5b is a electrical diagram, in block form, illustratinginterface signals of the global peripheral port of the microcomputer ofFIG. 1.

[0028]FIG. 5c is a electrical diagram, in block form, illustratinginterface signals of the local peripheral port of the microcomputer ofFIG. 1.

[0029]FIG. 5d is a block diagram illustrating the relationship betweenthe bits of an address defining the current page and the bits of anaddress defining the addresses on a current page.

[0030]FIG. 5e is a block diagram illustrating the global peripheralinterface control register of the microcomputer of FIG. 1.

[0031]FIG. 5f is a block diagram illustrating the global peripheralinterface control register of the microcomputer of FIG. 1.

[0032]FIG. 5g is a block diagram illustrating the effect of the STRBACTIVE field on the memory map of the global memory bus of themicrocomputer of FIG. 1.

[0033]FIG. 6a is a timing diagram illustrating when signal RDY_ issampled in relation to the STRB_ and H1 signals of the global peripheralport of the microcomputer of FIG. 1.

[0034]FIG. 6b is a timing diagram illustrating a read, read and writesequence to the same page of an external memory map via the globalperipheral port of the microcomputer of FIG. 1.

[0035]FIG. 6c is a timing diagram illustrating a write, write and readsequence to the same page of an external memory map via the globalperipheral port of the microcomputer of FIG. 1.

[0036]FIG. 6d is a timing diagram illustrating a read same page, readdifferent page and a read same page sequence to an external memory mapvia the global peripheral port of the microcomputer of FIG. 1.

[0037]FIG. 6e is a timing diagram illustrating a write same page, writedifferent page and a write same page sequence to an external memory mapvia the global peripheral port of the microcomputer of FIG. 1.

[0038]FIG. 6f is a timing diagram illustrating a write same page, readdifferent page and a write different page sequence to an external memorymap via the global peripheral port of the microcomputer of FIG. 1.

[0039]FIG. 6g is a timing diagram illustrating a read different page,read different page and a write same page sequence to an external memorymap via the global peripheral port of the microcomputer of FIG. 1.

[0040]FIG. 6h is a timing diagram illustrating a write different page,write different page and a read same page sequence to an external memorymap via the global peripheral port of the microcomputer of FIG. 1.

[0041]FIG. 6i is a timing diagram illustrating a read same page, writedifferent page and a read different page sequence to an external memorymap via the global peripheral port of the microcomputer of FIG. 1.

[0042]FIG. 7a is an electrical diagram, in block form, of the controllerof the microcomputer of FIG. 1.

[0043]FIG. 7b is a timing diagram illustrating the pipelining ofinstruction codes performed by the controller of FIG. 6a.

[0044]FIG. 8a is a chart illustrating the properties of a delayed branchinstruction, trap instruction and a delayed branch instruction.

[0045]FIG. 8b is a diagram illustrating the initiation of the delayedtrap instruction in relation to the intervals of the pipeline of themicrocomputer of FIG. 1.

[0046]FIG. 8c is a diagram illustrating a trap vector table of themicrocomputer of FIG. 1.

[0047]FIG. 8d is a flow chart illustrating the execution of a delayedtrap instruction of the microcomputer of FIG. 1.

[0048]FIG. 8e is a diagram illustrating the initiation of the repeatblock delayed instruction in relation to the intervals of the pipelineof the microcomputer of FIG. 1.

[0049]FIG. 8f is a electrical diagram, in block form, of the repeatblock logic contained in the CPU of the microcomputer of FIG. 1.

[0050]FIG. 8g is a flow chart illustrating the execution of a repeatblock delayed instruction of the microcomputer of FIG. 1.

[0051]FIG. 9 is an electrical diagram, in block form, of the instructioncache of the microcomputer of FIG. 1.

[0052]FIG. 10 is an electrical diagram, in block form, of the DMAcoprocessor of the microcomputer of FIG. 1.

[0053]FIG. 11 is a block diagram of the split-mode DMA operation of themicrocomputer of FIG. 1.

[0054]FIG. 12a is a diagram illustrating the rotating priority schemeimplemented for the six DMA channels of the microcomputer of FIG. 1.

[0055]FIG. 12b is a diagram illustrating the rotating priority schemeimplemented for split-mode DMA operation of the microcomputer of FIG. 1.

[0056]FIG. 13 is an electrical diagram, in block form, of the peripheralmodules and peripheral bus of the microcomputer of FIG. 1.

[0057]FIG. 14 is an electrical diagram, in block form, of twocommunication ports directly interfaced.

[0058]FIG. 15 is an electrical diagram, in block form, of thecommunication port of the microcomputer of FIG. 1.

[0059]FIG. 16 is a state diagram, in block form, of the communicationport arbitration unit of the microcomputer of FIG. 1.

[0060]FIG. 17 illustrates the signal convention used between twoconnected communication ports A and B.

[0061]FIG. 18a is a timing diagram illustrating a token transfer betweencommunication ports A and B.

[0062]FIG. 18b is a timing diagram illustrating data transfer betweencommunication ports A and B.

[0063]FIG. 19 illustrates a stand-alone configuration of the improveddata processing device of FIG. 1 configured to show connection to aplurality of memory and peripheral devices, as well as connection toother systems via communication ports.

[0064]FIG. 20 illustrates a parallel processing system architecture withexternal memory in the form of building blocks.

[0065]FIG. 21 illustrates a single data processing device withoutexternal memory in the form of building blocks.

[0066]FIG. 22 illustrates another parallel processing systemarchitecture in a pipelined linear array or systolic array.

[0067]FIG. 23 illustrates another parallel processing systemarchitecture in the form of a bidirectional ring.

[0068]FIG. 24 illustrates another parallel processing systemarchitecture in the form of a tree.

[0069]FIG. 25 illustrates another parallel processing systemarchitecture wherein the communication ports are used to support avariety of two-dimensional structures such as a lattice.

[0070]FIG. 26 illustrates another parallel processing systemarchitecture wherein a two-dimensional structure, in the form of ahexagonal mesh, is constructed.

[0071]FIG. 27 illustrates another parallel processing systemarchitecture using a three-dimensional grid or cubic lattice.

[0072]FIG. 28 illustrates another parallel processing systemarchitecture where a four-dimensional hypercube structure is utilized.

[0073]FIG. 29 illustrates another parallel processing systemarchitecture which illustrates a combination of shared memory andprocessor-to-processor communication.

[0074]FIG. 30 illustrates yet another configuration of parallelprocessing system architecture wherein communication ports and supportfor shared global memory permit a variety of configurations.

[0075]FIG. 31 illustrates another parallel processing systemarchitecture wherein a plurality of improved data processing devices ofFIG. 1 interface to global and local memory.

[0076]FIG. 32 illustrates yet another configuration of parallelprocessing system architecture where a plurality of data processingdevices of FIG. 1 share a plurality of global memories.

[0077]FIG. 33 illustrates another configuration of parallel processingsystem architecture where communication between some processors areestablished via modems.

[0078]FIG. 34 illustrates a example robotic structure that utilizes theparallel processing system architecture.

[0079]FIG. 35 illustrates a circuit used to multiplex data for thethree-operand addressing instructions.

[0080]FIG. 36a illustrates a circuit which counts the three instructionsfetched after a delayed trap instruction.

[0081]FIG. 36b illustrates an incrementer used in the implementation ofthe delayed trap instructions.

[0082] Corresponding numerals and other symbols refer to correspondingparts in the various figures of drawings except where the contextindicates otherwise.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0083] Referring now to FIG. 1, the architecture of a microcomputer 10is shown, said microcomputer being specially adapted to digital signalprocessing and incorporating the instant invention. The major functionalblocks of microcomputer 10 are constituted by central processing unit(CPU) 12, controller 14, and direct memory access (DMA) coprocessor 22.The memory contained in microcomputer 10 according to this embodiment ofthe invention includes random access memories (RAMs) 16 and 18, andread-only memory (ROM) 20. RAMs 16 and 18 contain, in this embodiment,2¹⁰, or 1K, words; ROM 20 contains 2¹², or 4K, words. Externalconnection is made by way of peripheral ports 24 and 26, which multiplexvarious bus signals onto external terminals of microcomputer 10 andwhich provide special purpose signals for communication to externaldevices which are to receive and send data via such external terminals.Connected to peripheral port 25 is peripheral bus 28, which is adaptedto be connected to various peripheral function blocks as will beexplained hereinbelow.

[0084] Data communication within microcomputer 10 can be effected by wayof data bus 30. Data bus 30 contains a set of data lines 30 d which arededicated to the communication of data signals among memories 16, 18 and20, peripheral ports 24, 25 and 26, and CPU 12. In this embodiment ofthe invention, data bus 30 contains thirty-two data lines in set 30 d;accordingly, the data signals communicated among memories 16, 18 and 20,peripheral ports 24, 25 and 26, and CPU 12 are considered as thirty-twobit words. Data bus 30 further contains a first set of address lines 30a and a second set of address lines 30 b, both of which are forcommunication of address signals corresponding to memory locations inmemories 16, 18 and 20. In this embodiment of the invention, data bus 30contains thirty-two address lines in each of sets 30 a and 30 b. Addresslines 30 a and 30 b are also connected among CPU 12, peripheral ports24, 25 and 26, and memories 16, 18 and 20. As is evident from FIG. 1,memories 16, 18 and 20 each have two ports 32 a and 32 d. Each of ports32 a are connected to address lines 30 a and 30 b of data bus 30, andreceive the address signals presented thereupon to provide access to thecorresponding memory location by way of port 32 d to data lines 30 d ofdata bus 30.

[0085] Microcomputer 10 also effects communication by way of program bus34. Similarly as data bus 30, program bus 34 contains a set of datalines 34 d connected to ports 32 d of memories 16, 18 and 20. Data lines34 d of program bus are also connected to peripheral ports 24, 25 and26, and to controller 14. Program bus 34 further contains a set ofaddress lines 34 a, which are connected to ports 32 a of memories 16, 18and 20, to peripheral ports 24, 25 and 26, and to controller 14. Alsoconnected to address bus 34 is instruction cache 36 which also has ports32 a and 32 d connected to address lines 34 a and data lines 34 d,respectively. Instruction cache 36 is a small (128 word) high speedmemory which is used to retain the most recently used instruction codesso that, if external memory devices are used for program storage, theretrieval of repetitively used instructions can be effected at the samerate as from memories 16, 18 and 20. Detailed construction and operationof instruction cache 36 is given hereinbelow. Controller 14 containssuch circuitry as required to decode instruction codes received on datalines 34 d of program bus 34 into control signals which control thespecific logic circuitry contained in all blocks of microcomputer 10.FIG. 1 illustrates lines SEL₁₆, SEL₁₈, SEL₂₀, SEL₂₄, SEL₂₅ and SEL₂₆which carry certain of these control signals to control access ofmicrocomputer 10 to memories 16, 18, and 20, and peripheral ports 24, 25and 26, respectively. Control signals CNTL14 provide communicationcontrols between CPU 12 and communication ports 50 through 55; othersuch control signals generated by controller 14 are not shown in FIG. 1,for purposes of clarity. Because of its connection to instruction cache36 and to controller 14, program bus 34 is used primarily for theaddressing and communication of instruction codes contained in memories16, 18 and 20. According to the invention, such instruction codes canreside in any of memories 16, 18 and 20, or in external memory, withoutdesignation of any specific locations as dedicated to program memory.

[0086] DMA coprocessor 22 is connected to memories 16, 18 and 20 by wayDMA bus 38. Similarly as data bus 30 and program bus 34, DMA bus 38 hasa set of data lines 38 d which are connected to ports 32 d of memories16, 18 and 20. DMA bus 38 further has a set of address lines 38 aconnected to ports 32 a of memories 16, 18 and 20. DMA coprocessor 22 isalso connected to peripheral bus 28, and to peripheral ports 24, 25 and26. DMA coprocessor 22 effects direct memory access operations, by whichblocks of data stored within the memory space of microcomputer 10 may bemoved from one area of memory (the source) to another (destination). Thesource area of memory may be within memories 16, 18 or 20, or in memorydevices external to microcomputer 10 which are connected to theterminals served by peripheral port 24 and 26, and the destination ofthe block of data may be in all of such memories (except of course ROM20). It is apparent from the construction of microcomputer 10 as shownin FIG. 1, and from the descriptive name given (DMA coprocessor 22),that such DMA operations may be effected by DMA coprocessor 22 inmicrocomputer 10 without requiring the intervention of CPU 12.

[0087] At the conclusion of a block transfer, the DMA coprocessor 22 canbe programmed to do several things: an interrupt can be generated tosignal that the block transfer is complete; the DMA channel can stopuntil reprogrammed; or most importantly, the DMA channel canautoinitialize itself at the start of the next block transfer foreffectuating another block transfer by obtaining a new source anddestination area space within memories 16, 18 or 20 or in memory devicesexternal to microcomputer 10 which are connected to the terminals servedby peripheral port 24 and 26. This autoinitalization for effectuatinganother block transfer is done without any intervention by the CPU.

[0088] Six specialized communication ports 50 through 55 are served byperipheral port 25 and peripheral bus 28. Communication ports 50 through55 provide additional means for external data transfers. Control signalsDMA22 provide communication controls between DMA coprocessor 22 andcommunication ports 50-55. FIGS. 2a and 2 b illustrate the versatilityof the communication ports. In FIG. 2a, the communication port isconnected to a stream oriented device such as an analog to digital (A/D)converter. It should be noted that control and data signals 585 areproperly matched. Utilizing the input and output first-in-first-out(FIFO) buffers 540 and 550, the communication port provides a bufferedinterface for the stream oriented device. Other stream oriented devicesinclude a digital to analog (D/A) converter. FIG. 2b shows another dataprocessing device connected to the communication via interface 590. Itis apparent from the examples in FIGS. 2a and 2 b that interfacing tothe communication ports is readily accomplished through the use ofdevices with proper interface signals 585 built onto the device orthrough the use of an interfacing module 590 that is designed to provideproper interface signals 585 to existing devices not built toaccommodate the communication port.

[0089] Each one of the communication ports 50 through 55 provide abidirectional interface 580 with an eight word (thirty-two bits/word)deep input first-in-first-out (FIFO) buffer 540 and an eight word deepoutput FIFO buffer 550. Arbitration and handshaking circuitry 500 isself contained within each communication port for effectuating externalcommunications via control and data lines 585. A detailed description ofthe communication ports 50 through 55 is discussed below. It should benoted that the preferred embodiment of microcomputer 10 has a specialsplit-mode operation that utilizes the DMA coprocessor 22 andcommunication ports 50 through 55. In split-mode one DMA channel istransformed into two channels: one DMA channel is dedicated to receivingdata from a communication port (the source) and writing it to a locationin the memory map (destination); and one DMA channel is dedicated toreading data from a location in the memory map (the source) and writingit to a communication port (destination). Details of the split-mode DMAwill be further described below.

[0090] There are six DMA channels in the preferred embodiment; each ofthem are capable of performing all of the functions describedhereinabove. Since all six DMA channels use the same DMA bus 38 andperipheral bus 28 to effectuate its block transfers, conflicts for DMAaccesses might occur between the channels. Thus, the DMA coprocessor 22also functions to arbitrate requests from any or all of the six DMAchannels requesting access to the DMA bus 38 and peripheral bus 28. TheDMA coprocessor 22 implements a rotating priority scheme to insure thatany channel requesting bus access will in turn be serviced. Details ofthe rotating priority scheme will be further described below.

[0091] Ports 32 a are primarily multiplexers, so that selection of oneset of address lines 30 a, 30 b, 34 a, or 38 a for connection to itsassociated memory 16, 18 or 20 can be effected. Similarly, each of ports32 d are connected to data lines 30 d of data bus 30, for communicationof the data stored (or to be stored) by the addressed memory location.Memories 16, 18 and 20 each contain an address decoder 33, connected toits port 32 a, for decoding the memory address signal presented on theselected one of said address lines 30 a, 30 b, 34 a, or 38 a. Based onthe output from address decoder 33, access is granted to the memorylocation specified by the selected address signal. RAMs 16 and 18, andROM 20, are all constructed so that the selected memory location issensed and/or written based upon the output of address decoder 33therewithin. Ports 32 d provide a high-impedance output to the datalines of buses 30, 34 and 38 connected thereto when not selected,thereby preventing data conflicts on buses 30, 34 and 38.

[0092] Each of the sets of address lines in data bus 30, program bus 34and DMA bus 38 consist of thirty-two conductors in the preferredembodiment of this invention. Accordingly, the maximum number of memorylocations addressable by way of the sets of address lines in data bus30, program bus 34 and DMA bus 38 is 232 words (four Giga-words) ofthirty-two bits. However, since the total number of words in memories16, 18 and 20 is 6K, a large amount of the addressable memory space ofmicrocomputer 10 may reside in memory devices external to microcomputer10. Such external memory has address decoding capability, similar to theon-chip memories 16, 18 and 20, and responds to the generated addresssignals on the address lines of buses 30, 34 and 38 in a similarfashion. In the preferred embodiment, a single memory address space isprovided for microcomputer 10, so that a given address signal presentedon any given set of address lines of buses 30, 34 and 38 will address amemory location in only one of memories 16, 18 and 20. Therefore, usingthe example of address lines 30 a being selected by ports 32 a, a givenaddress signal on address lines 30 a will correspond to a memorylocation in only one of memories 16, 18 and 20, or in external data,program or input/output memory. It should be noted that microcomputer 10is organized in such a fashion that it is preferable that external dataand program memory be accessed by way of peripheral port 24 and 26, andthat internal input/output memory be accessed by way of peripheral port25.

[0093] Peripheral bus 28 is connected between peripheral port 25 andvarious peripheral functions. Peripheral bus is therefore selectivelyconnectable to any one of buses 30, 34 and 38, depending upon thecontrol of peripheral port 25 by controller 14. In this manner,peripheral bus 28 appears to the remainder of microcomputer 10 as anoff-chip bus. This provides for such functions as normally provided byperipheral devices to be incorporated into microcomputer 10;communications with such peripheral devices are performed by theremainder of microcomputer 10 in much the same way as an off-chipdevice. By way of example, microcomputer 10 of FIG. 1 has timer 40 and41, analysis module 42 and six communication ports 50-55 attached toperipheral bus 28. Similarly as the other buses described above,peripheral bus 28 contains data lines 28 d and address lines 28 a. Incontrast to the communication between memories 16, 18 and 20 and theremainder of microcomputer 10 connected to buses 30, 34 and 38, however,address lines 28 a of peripheral bus 28 are used to select one of saidperipherals 40, 41, 42 or communication ports 50-55 connected thereto toreceive or transmit data from or to data lines 28 d of peripheral bus28. In addition, as will be described below, control registers in DMAcoprocessor 22 and in communication ports 50-55 are also accessed by wayof peripheral bus 28.

[0094] The construction and operation of a CPU and its addressing modessimilar to CPU 12 is described in the incorporated U.S. Pat. No.4,912,636. However, CPU 12 is modified to embody a larger multipliercapable of handling thirty-two bits by thirty-two bits integermultiplies and forty bits by forty bits floating point multiplies. CPU12 incorporates a reciprocal seed ROM used to compute an approximationto 1/B where B is the divisor. A reciprocal square root seed ROM is alsopresent for generating a seed approximating the reciprocal of the squareroot of the operand for square root calculations. The advantages anddetails about the operation of the seed ROM is described in U.S. Pat.No. 4,878,190 assigned to Texas Instruments Incorporated (TI Docket13241) which is incorporated herein by reference.

[0095]FIG. 1a shows a number of control registers 160 of the preferredembodiment of CPU 12. Interrupt and trap vector table pointers 161 areeach 32-bit registers. These registers reside in a CPU 12 expansionregister-file located away from CPU 12 unlike other control registersfor CPU 12 that reside within CPU 12. Since interrupt and trap vectortable pointers 161 are control registers of CPU 12, CPU 12 accesses theregisters at various times. Thus, instructions are available to performa load from an expansion register to a primary register for use by CPU12. Conversely, a command is available to perform a load from a primaryregister to an expansion register when the primary register is loadedwith control data from another control register within CPU 12.

[0096] The interrupt vector table pointer (IVTP) points to the interruptvector table (IVT) which contains addresses of the first instruction ofinterrupt routines.

[0097] The trap vector table pointer (TVTP) points to the trap vectortable (TVT) which contains addresses of the first instruction of traproutines.

[0098] Interrupt and trap routines are instructions that are executedduring the execution of the main program to accommodate situationsconfronted by microcomputer 10 of the preferred embodiment.

[0099] The CPU and DMA interrupt mask and flags 162 are 32-bitregisters. The mask registers are used to enable or disable interruptswhile the flag registers are set by devices indicating a condition hasoccurred.

[0100] The stack pointer (SP) 163 is a 32-bit register that contains theaddress of the top of the system stack. The SP points to the lastelement pushed onto the stack.

[0101] Block repeat register 164 are 32-bit registers containing thestarting and ending address of the block of program memory to berepeated when operating in the repeat mode.

[0102] The status register 165 is a 32-bit register containing globalinformation relating to the state of CPU 12.

[0103] Index register 166 are 32-bit registers used by the auxiliaryregister arithmetic units for indexing addresses. The incorporated U.S.Pat. No. 4,912,636 describes the operations of indexing addresses.

[0104] The preferred embodiment has improved three-operand addressinginstructions. The three-operand addressing not only includes two datafetches for operands and one data load for the result into a registerfile but further features also. The data fetches selectively supportedby the preferred embodiment are: immediate data from the instruction,memory data located at a displacement of an auxiliary register, and aregister in the register file. The four instruction formats are shown inFIG. 3. The description herein below mainly discusses the improvement ofthe instruction formats thus concentrating on the scr1 and scr2 field.The two scr1 and scr2 fields determine the operands for ALU 130 shown inFIG. 4. Rn field 120 of the instruction is a five bit field used toaddress a register in register file 131 as shown in FIG. 4. Immediatefield 121 of the instruction is immediate data residing in theinstruction word that is decoded and extracted by instruction decode andcontrol 202. ARn 122 and ARm 123 correspond with dispn 124 and dispm 125of the instruction respectively to effectuate indirect addressing asdescribed in the incorporated U.S. Pat. No. 4,912,636. AR file 132 andauxiliary ALU 133 and 134 are used to effectuate the indirect addressesfor the data operands residing in memory 135.

[0105] Referring to FIG. 4, the instruction register 94 containing theinstruction word is decoded by instruction decode and control 202 whereappropriate control and data signals are generated. For example, the ARnfield 122 and ARm field 123 are decoded, and signals ARn_select andARm_select are generated to select address data from address register(AR) file 132. The fields dispn 124 and dispm 125 are decoded andextracted from the instruction word and sent to auxiliary ALU 133 and134 where the address data from AR file 132 are combined. Addressescorresponding to locations in memory 135 are generated and operands arefetched and fed to ALU 130. The immediate field 121 is decoded andextracted from the instruction word and becomes an operand to ALU 130.The Rn field 120 is decoded by instruction decode and control 202 andsignal Rn_select is generated to select the contents of Rn from registerfile 131. The dst field 126 is decoded by instruction decode and control202 and signal dst_select is generated to select the destinationregister to store the result of the operation from ALU 130 to registerfile 131. The operation field is decoded and extracted by theinstruction decode and control 202 to control the operation of ALU 130.Since fields 128 and 129 are not pertinent to the understanding of theimproved three-operand instruction and for purposes of clarity, they arenot discussed.

[0106] The four additional three-operand instruction formats shown inFIG. 3 are developed to support the most common form of data addressingrequired for compiled code. As a result these instructions reduce codesize for both hand assembled and compiled code. Thus, noticeableimprovements in performance is realized in the speed and efficiency atwhich microcomputer 10 can perform its programmed tasks.

[0107] Referring now to FIG. 5a, the construction of peripheral ports24, 25 and 26 is described in detail. Peripheral ports 24, 25 and 26 areconnected to data bus 30, program bus 34 and DMA bus 38, as describedwith reference to FIG. 1. Peripheral port 24 consists primarily of amultiplexer 100, which selectively connects external data lines GD_(n)to data lines 30 d of data bus 30, data lines 34 d of program bus 34 ordata lines 38 d of DMA bus 38, responsive to control signals generatedon lines SEL₂₄ by controller 14. It should be noted that multiplexer 100creates a bidirectional connection between external data lines GD_(n)and the data lines 30 d, 34 d or 38 d, so that data may be received orpresented therebetween. In addition, multiplexer 102 selectivelyconnects external address lines GA_(n) to address lines 30 a or 30 b ofdata bus 30, address lines 34 a of program bus 34, or address lines 38 aof DMA bus 38, also responsive to controller 14 depending upon whichdata lines are connected by multiplexer 100 to data lines GD_(n).

[0108] Peripheral port 26 is similarly constructed as peripheral port24, but is controlled by lines SEL₂₆ independently from peripheral port24, so that communication at peripheral ports 24, 25 and 26 can occursimultaneously and independently, so long as the buses 30, 34 and 38used by the ports are not simultaneously used. Peripheral port 26 is anadditional peripheral port having the same capabilities as peripheralport 24. Accordingly, as shown in FIG. 5a, peripheral port 26 containsmultiplexers 108 and 110 corresponding to like components in peripheralport 24.

[0109] Control and operation of the two external peripheral interfacesof the preferred embodiment—global peripheral port 24 (or global memoryinterface) and local peripheral port 26 (or local memory interface)—arediscussed in detail. For purposes of this discussion the two ports arefunctionally identical, thus discussion of global peripheral port 24also applies to local peripheral port 26. FIG. 5b shows the interfacesignals for global peripheral port 24, and FIG. 5c shows the interfacesignals for local peripheral port 26.

[0110] Global peripheral port 24 has separate 32-bit data and 32-bitaddress buses. Two sets of control signals are available for interfacingwith multiple devices. Multiple sets of control signals are advantageousparticularly if interfacing devices operate at access times slower thanperipheral port 24. Thus, time spent waiting (idle time) for an externaldevice to respond is used to access another external device and the datathroughput of global peripheral port 24 is maximized.

[0111] Control signals STRB1_ and STRB2_ are shown in FIG. 5b. It shouldbe noted that signal names shown in Figures with over bars above thesignal name represent the corresponding signal name having a suffix “_”in the text. STRB1_ and STRB2_ become active signalling the intervalwhen valid information and control signals can be passed betweenperipheral port 24 and the connected external device. R/W0_ and R/W1_specify the direction of the flow of data through peripheral port 24.Control signals RDY0_ and RDY1_ are used to signal valid data isavailable on the selected bus. Control signals PAGE0 and PAGE1 signalthe transition to perform data operations on another page of a pagepartitioned memory.

[0112] The preferred embodiment, using a 32-bit address, has independentpage sizes for the different sets of external strobes. This featureallows great flexibility in the design of external high speed,high-density memory systems and the use of slower external peripheraldevices. Both the STRB0 PAGESIZE and STRB1 PAGESIZE fields work in thesame manner. The PAGESIZE field specifies the page size for thecorresponding strobe. The PAGESIZE field is discussed herein-below.Table 1.1 illustrates the relationship between the PAGESIZE field andthe bits of the address used to define the current page and theresulting page size. The page size is from 256 words, with externaladdress bus bits 7-0 defining the location on a page, up to 2 Giga wordswith external address bus bits 30-0 defining the location on a page.FIG. 5d illustrates an external address showing the relationship betweenthe bits of an address defining the current page and the bits of anaddress defining the addresses on a current page. As shown in Table 1.1,the field for external address bus bits defining addresses on a pageincreases as the number of addressable words on a page increases i.e.page size. Inversely, the number of bits defining the current pageincreases as the number of addressable pages increases. The trade offbetween bits used to address pages and words is shown in Table 1.1.TABLE 1.1 External External address bus address bus bits defining bitsdefining PAGESIZE the current address on a Page size field page page(32-bit words) 11111 Reserved Reserved Reserved 11110 None 30-0 2³¹ = 2G11101 30 29-0 2³⁰ = 1G 11100 30-29 28-0 2²⁹ = 512M 11011 30-28 27-0 2²⁸= 256M 11010 30-27 26-0 2²⁷ = 128M 11001 30-26 25-0 2²⁶ = 64M 1100030-25 24-0 2²⁵ = 32M 10111 30-24 23-0 2²⁴ = 16M 10110 30-23 22-0 2²³ =8M 10101 30-22 21-0 2²² = 4M 10100 30-21 20-0 2²¹ = 2M 10011 30-20 19-02²⁰ = 1M 10010 30-19 18-0 2¹⁹ = 512K 10001 30-18 17-0 2¹⁸ = 256K 1000030-17 16-0 2¹⁷ = 128K 01111 30-16 15-0 2¹⁶ = 64K 01110 30-15 14-0 2¹⁵ =32K 01101 30-14 13-0 2¹⁴ = 16K 01100 30-13 12-0 2¹³ = 8K 01011 30-1211-0 2¹² = 4K 01010 30-11 10-0 2¹¹ = 2K 01001 30-10  9-0 2¹⁰ = 1K 0100030-9  8-0 2⁹ = 512 00111 30-8  7-0 2⁸ = 256 00110-00000 ReservedReserved Reserved

[0113] Changing from one page to another has the effect of inserting acycle in the external access sequence for external logic to reconfigureitself in an appropriate way. The memory interface control logic 104keeps track of the address used for the last access for each STRB_. Whenan access begins, the page signal corresponding to the active STRB_ goesinactive if the access is to a new page. The PAGE0 and PAGE1 signals areindependent of one another, each having its own page size logic.

[0114] Referring to FIG. 5b control signals CE0_ and CE1_ are controlenable signals. CE0_ causes lines R/W0_, STRB0_ and PAGE0 to be in thehigh-impedance state. Similarly, control signal CE1_ causes lines R/W1_,STRB1_ and PAGE1 to be in the high-impedance state.

[0115] The preferred embodiment has separate enable signals for the databus and address bus. Signal DE_ controls the data bus which and signalAE_ controls the address bus which has 31-bits. There are 4-bits thatare used to define the current status of the peripheral port as definedin Table 1.2. The status signals identify STRB0_ and STRB1_ accesses,data reads and writes, DMA reads and writes, program reads, and SIGI(SIGnal Interlock) reads.

[0116] Signal interlock is used in configurations where there is sharingof global memory by multiple processors. In order to allow multipleprocessors to access the global memory and share data in a coherentmanner, handshaking and arbitration is necessary. TABLE 1.2 STAT3 STAT2STAT1 STAT0 Status 0 0 0 0 STRB0_access, program read 0 0 0 1STRB0_access, data read 0 0 1 0 STRB0_access, DMA read 0 0 1 1STRB0_access, SIGI read 0 1 0 0 Reserved 0 1 0 1 STRB0_access, datawrite 0 1 1 0 STRB0_access, DMA write 0 1 1 1 Reserved 1 0 0 0STRB1_access, program read 1 0 0 1 STRB1_access, data read 1 0 1 0STRB1_access, DMA read 1 0 1 1 STRB1_access, SIGI read 1 1 0 0 Reserved1 1 0 1 STRB1_access, data write 1 1 1 0 STRB1 access, DMA write 1 1 1 1Idle

[0117] Control signal LOCK_ in the logic “0” state signals aninterlocked access is under way. If LOCK_ is a logic “1” state, aninterlocked access is not under way.

[0118] The memory map for the memory interface control registers is000100000_(h) for the global memory interface control register and000100004_(h) for the local memory interface control register. Sinceboth the global and local memory interfaces are functionally identicalfor purposes of this discussion, references to the global memoryinterface also applies to the local memory interface. The global memoryinterface control register has bits defined in terms of logic “0”s and“1”s that control the global memory interface. The memory controlregister defines the page sizes used for the two strobes, when thestrobes are active, wait states, and other similar operations thatdefine the character of the global memory interface.

[0119] The bit field definition of the global memory interface controlregister is shown in FIG. 5e. Table 2.1 defines the register bits, theregister bit names, and the register bit functions. The bit fielddefinition of the local memory interface control register is shown inFIG. 5f. Register bit functions and locations are very similar to globalmemory interface control register, thus Table 2.1 is adequate fordescribing the local memory interface control register. TABLE 2.1 BitPosition Bit Definition  0 CEO_(—) Value of the external pin CEO_. Thevalue is not latched.  1 CE1_(—) Value of the external pin CE1_. Thevalue is not latched.  2 DE_(—) Value of the external pin DE_. The valueis not latched.  3 AE_(—) Value of the external pin AE_. The value isnot latched.  4-5 STRB0 Software wait state generation for STRB0_(—) SWWaccesses. In conjunction with STRBO WTCNT, this field defines the modeof wait-state generation.  6-7 STRB1 Software wait state generation forSTRB1_(—) SWW accesses. In conjunction with STRB1 WTCNT, this fielddefines the mode of wait-state generation.  8-10 STRB0 Softwarewait-state count for STRB0_accesses. WTCNT This field specifies thenumber of cycles to use when software wait-states are active. The rangeis zero (STRBO WTCNT = 000) to seven (STRBO WTCNT = 111). 11-13 STRB1Software wait-state count for STRB1_accesses. WCTNT This field specifiesthe number of cycles to use when software wait-states are active. Therange is zero (STRB1 WTCNT = 000) to seven (STRB1 WTCNT = 111) 14-18STRB0 Page size for STRB0_accesses. Specifies the PAGESIZE number ofmost significant bits (MSBs) of the address to be used to define thebank size for STRB0_accesses. 19-23 STRB1 Page size for STRB1_accesses.Specifies the PAGESIZE number of MSBs of the address to be used todefine the bank size for STRB1_accesses. 24-28 STRB Specifies theaddress ranges over which STRB0_(—) ACTIVE and STRB1_are active. 29 STRBWhen STRB SWITCH is 1, a single cycle is SWITCH inserted between back toback reads which switch from STRB0_to STRB1_(—) (or STRB1_to STRB0_).When STRB SWITCH is 0, no cycle is inserted between these back to backreads. 30-31 Reserved Read as 0.

[0120] Table 2.2 illustrates the relationship between STRB ACTIVE andthe address ranges over which STRB0_ and STRB1_ are active, and the sizeof the address range over which STRB0_ is active. STRB ACTIVE fieldcontrols global peripheral port 24, and LSTRB ACTIVE field controlslocal peripheral port 26. Table 2.3 illustrates the relationship betweenLSTRB ACTIVE and the address ranges over which LSTRB0_ and LSTRB1_ areactive, and the size of the address range over which STRB0_ is active.TABLE 2.2 STRB0_(—) STRB active ACTIVE STRB0_active address STRB1_activefield address range range size address range 11111 Reserved ReservedReserved 11110 80000000-FFFFFFFF 2³¹ = 2G None 11101 80000000-BFFFFFFF2³⁰ = 1G C0000000-FFFFFFFF 11100 80000000-9FFFFFFF 2²⁹ = 512MA0000000-FFFFFFFF 11011 80000000-8FFFFFFF 2²⁸ = 256M 90000000-FFFFFFFF11010 80000000-87FFFFFF 2²⁷ = 128M 88000000-FFFFFFFF 1100180000000-83FFFFFF 2²⁶ = 64M 84000000-FFFFFFFF 11000 80000000-81FFFFFF2²⁵ = 32M 82000000-FFFFFFFF 10111 80000000-80FFFFFF 2²⁴ = 16M81000000-FFFFFFFF 10110 80000000-807FFFFF 2²³ = 8M 80800000-FFFFFFFF10101 80000000-803FFFFF 2²² = 4M 80400000-FFFFFFFF 1010080000000-801FFFFF 2²¹ = 2M 80200000-FFFFFFFF 10011 80000000-800FFFFF 2²⁰= 1M 80100000-FFFFFFFF 10010 80000000-8007FFFF 2¹⁹ = 512K80080000-FFFFFFFF 10001 80000000-8003FFFF 2¹⁸ = 256K 80040000-FFFFFFFF10000 80000000-8001FFFF 2¹⁷ = 128K 80020000-FFFFFFFF 0111180000000-8000FFFF 2¹⁶ = 64K 80010000-FFFFFFFF 01110- Reserved ReservedReserved 00000

[0121] TABLE 2.3 LSTRB0_(—) LSTRB active ACTIVE LSTRB0_active addressLSTRB1_active field address range range size address range 11111Reserved Reserved Reserved 11110 00000000-7FFFFFFF 2³¹ = 2G None 1110100000000-3FFFFFFF 2³⁰ = 1G 40000000-7FFFFFFF 11100 00000000-1FFFFFFF 2²⁹= 512M 20000000-7FFFFFFF 11011 00000000-0FFFFFFF 2²⁸ = 256M10000000-7FFFFFFF 11010 00000000-07FFFFFF 2²⁷ = 128M 08000000-7FFFFFFF11001 00000000-03FFFFFF 2²⁶ = 64M 04000000-7FFFFFFF 1100000000000-01FFFFFF 2²⁵ = 32M 02000000-7FFFFFFF 10111 00000000-00FFFFFF2²⁴ = 16M 01000000-7FFFFFFF 10110 00000000-007FFFFF 2²³ = 8M00800000-7FFFFFFF 10101 00000000-003FFFFF 2²² = 4M 00400000-7FFFFFFF10100 00000000-001FFFFF 2²¹ = 2M 00200000-7FFFFFFF 1001100000000-000FFFFF 2²⁰ = 1M 00100000-7FFFFFFF 10010 00000000-0007FFFF 2¹⁹= 512K 00080000-7FFFFFFF 10001 00000000-0003FFFF 2¹⁸ = 256K00040000-7FFFFFFF 10000 00000000-0001FFFF 2¹⁷ = 128K 00020000-7FFFFFFF01111 00000000-0000FFFF 2¹⁶ = 64K 00010000-7FFFFFFF 01110- ReservedReserved Reserved 00000

[0122]FIG. 5g shows the effect of STRB ACTIVE on the memory map of theglobal memory bus. Part (a) shows a condition with the STRB ACTIVEfield=11110. In this configuration, STRB0_ is active over the entireaddress range of the global memory bus. Part (b) shows a condition withthe STRB ACTIVE field=10101. In this configuration, STRB0_ is activefrom address 80000000_(h)-803FFFFF_(h) and STRB1_ is active formaddresses 80400000_(h)-FFFFFFFF_(h).

[0123] The distinction between global and local interface signals STRB0_and STRB1_ is dropped except where it is needed for the sake of clarity.It should be noted that signal names shown in the Figures with suffix“-” are equivalent to corresponding signal names with suffix “_” FIG. 6ashows that STRB_ transitions on the falling edge of H1. RDY_(—) issampled on the falling edge of H1. Other general guidelines that applyto FIGS. 6b to 6 i aid in understanding the illustrated logical timingdiagrams of the parallel external interfaces:

[0124] 1. Changes of R/W_ are framed by STRB_.

[0125] 2. A page boundary crossing for a particular STRB_ results in thecorresponding PAGE signal going high for one cycle.

[0126] 3. R/W_ transitions are made on an H1 rising.

[0127] 4. STRB_ transitions are made on an H1 falling.

[0128] 5. RDY_ is sampled on an H1 falling.

[0129] 6. On a read, data is sampled on an H1 falling.

[0130] 7. On a write, data is driven out on an H1 falling.

[0131] 8. On a write, data is stopped being driven on H1 rising.

[0132] 9. Following a read, the address, status and page signal changeon H1 falling.

[0133] 10. Following a write, the address, status, and page signalchange on H1 falling.

[0134] 11. The fetch of an interrupt vector over an external interfaceis identified by the status signals for that interface (STAT or LSTAT)as a data read.

[0135] 12. PAGE goes high, STRB_ goes high.

[0136]FIG. 6b illustrates a read, read, write sequence. All threeaccesses are to the same page and are STRB1_ accesses. Back to backreads to the same page are single-cycle accesses. When transition from aread to a write is done, STRB_ goes high for one cycle in order to framethe R/W_ signal changing.

[0137]FIG. 6c illustrates that STRB_ goes high between back to backwrites and between a write and a read to frame the R/W_ transition.

[0138]FIG. 6d illustrates that when going from one page to another onback to back reads, an extra cycle is inserted and the transition issignalled by PAGE going high form one cycle. Also, STRB1_ goes high forone cycle.

[0139]FIG. 6e illustrates that on back to back writes and a page switchoccurs, an extra cycle is inserted and is signalled with PAGE high forone cycle.

[0140] Other combinations of write, read and page manipulations areshown in the following FIGS. 6f to 6 i.

[0141]FIG. 6f illustrates a write same page followed by a read differentpage and a write different page sequence.

[0142]FIG. 6g illustrates a read different page followed by a readdifferent page and a write same page.

[0143]FIG. 6h illustrates a write different page followed by a writedifferent page and a read same page sequence.

[0144]FIG. 6i illustrates a read same page followed by a write differentpage and a read different page sequence.

[0145] Peripheral port 25 is also similarly constructed as peripheralport 24, but is controlled by lines SEL₂₅ independently from peripheralport 24, so that communication at peripheral ports 24, 25 and 26 canoccur simultaneously and independently, so long as the buses 30, 34 and38 used by the ports are not simultaneously used. Peripheral port 25 isprimarily useful in communication with peripheral devices connected toperipheral bus 28. Accordingly, as shown in FIG. 5, peripheral port 25contains multiplexers 105 and 106 corresponding to like components inperipheral port 24.

[0146] A number of control lines are driven by buffers 104 in peripheralport 25, also responsive to signals generated by controller 14 (on lineswhich are not shown, for purposes of clarity). These control linesoutput by peripheral port 25 include line R/W_, the “_” designationindicating active low, which specifies the direction of the flow of datathrough peripheral port 25. The control lines connected to peripheralport 25 further include line STRB_ (as in line R/W_, the “_” designationindicating active low) driven by buffers 104 responsive to controller14, which is a clock signal indicating to external memory that the setof address lines 30 a, 30 b, 34 a or 38 a connected to lines A_(n), asthe case may be, are presenting a valid address signal to addressmemory. Line RDY_ is an input to microcomputer 10 from peripheraldevices of peripheral bus 28. Line RDY_ is an input to microcomputer 10and, when driven to its low logic state, indicates that a peripheraldevice of peripheral bus 28 connected to data lines D_(n), address linesA_(n), and control lines R/W_ and STRB_ has completed a communicationcycle with microcomputer 10. Controller 14 responds to the RDY_ signalto cause peripheral port 25 to drive said lines to valid states otherthan that directed to the communication cycle which had ended with theRDY_ signal low. It should be noted that, because of the plurality ofbuses 30, 34, and 38 connected to peripheral ports 24, 25 and 26,peripheral ports 24, 25 and 26 can be operating simultaneously.

[0147] The preferred embodiment of microcomputer 10 as noted earlierutilizes a single memory address space for all of the memories 16, 18and 20 and including the address of memory external to microcomputer 10and accessible via peripheral ports 24, 25 and 26. Table 3 shows thememory map of microcomputer 10 according to the preferred embodiment ofthe instant invention. TABLE 3 Address range (hexadecimal)Location/function 000000000 through 000000FFF ROM 20 000100000 through0001000FF I/O & other memory mapped registers 0002FF800 through0002FFBFF RAM 16 0002FFC00 through 0002FFFFF RAM 18 000300000 through0FFFFFFFF External memory

[0148] Referring now to FIG. 7a, the construction and operation ofcontroller 14 is be described in detail. Controller 14 serves thepurposes of controlling the operation of the rest of microcomputer 10,so that the desired operation specified by the instruction codes is beproperly executed.

[0149] Clock generator 200 in controller 14 is connected to terminals X1and X2 and generates the internal clock signals which are used inmicrocomputer 10, for example the system clock on line CLKIN. If acrystal is connected between terminals X1 and X2, clock generator 200will, by way of an internal oscillator, generate the system clock signalon line CLKIN. Alternatively, an externally-generated clock can beapplied to terminal X2, in which case the externally-generated clocksignal will generate (such as by a divide-by-n in clock generator 200,not shown) the system clock signal on line CLKIN. Clock generator 200further generates clock signals Q1 and Q2, which occur on the first andthird quarter-cycles of the period of the clock signal on line CLKIN,however generated; clock signals Q1 and Q2 are used by memory accessarbitration logic 206 in controller 14, as described below.Additionally, clock signals H1 and H3 are generated and applied to theexternal terminals of the microcomputer 10. Clock signals H1 and H3 haveperiods equal to twice CLKIN. However generated, clock signals H1 and H3are used by the communication ports, the CPU and other internal devices,and externally connected devices. Relative to the fetching ofinstruction codes and the control of microcomputer 10 responsive to suchinstruction codes, controller 14 contains program counter 92,instruction register 94, control logic 202, and program counter controllogic 204. Program counter 92 is a thirty-two bit register, having anoutput connected to address lines 34 a of program bus 34. The functionof program counter 92 is to store the memory address of the nextinstruction to be fetched, decoded, and executed by microcomputer 10. Inan instruction fetch cycle (which occurs during one period of the clocksignal H3, the contents of program counter 92 are placed upon addresslines 34 a of program bus 34 and the one of memories 16, 18 or 20 (orexternal memory) containing the memory location corresponding to theaddress signal presents the addressed contents onto data lines 34 d ofprogram bus 34; the contents of the memory location having the addresscontained in program counter 92 constitute the instruction code of thenext instruction to be decoded. Instruction register 94 is a thirty-twobit register which is connected to data lines 34 d of program bus 34,and which receives the contents of the contents of program counter 92during the fetch cycle.

[0150] During the decode cycle, occurring in the next period of thesystem clock signal on line H3 after the fetch cycle, the contents ofinstruction register 94 are decoded by control logic 202, to generatecontrol signals going from controller 14 to the functional circuits ofmicrocomputer 10. To accomplish this, a first portion of control logic202 contains combinatorial logic for decoding the instruction code. Suchcombinatorial logic (shown as logic 202 a in FIG. 4) can be realized indifferent well-known ways, such as a programmable logic array or aread-only memory. The thirty-two bit instruction code from instructionregister 94 is thus decoded by combinatorial logic 202 a into multipleoutput lines. Some of these lines are directly connected to functionsoutside of control logic 202, such as to program counter control logic204; other of these lines are input into sequential logic 202 b withincontrol logic 202. Sequential logic 202 b is operative to control thevarious functions of microcomputer 10 so as to allow the reading of dataoperands from memory by CPU 12, and so as to control the execution ofthe data processing operations on said operands by CPU 12. Sequentiallogic 202 b accomplishes this, of course, by way of additional outputlines emanating therefrom. The logic states of the output lines fromcontrol logic 202, whether from combinatorial logic 202 a or sequentiallogic 202 b, are thus determined by the instruction code received bycontrol logic 202 from instruction register 94. It should be noted thatthe drawing figures referred to herein do not show the connection ofthese control lines between controller 14 and such functional circuitryfor purposes of clarity.

[0151] It is therefore apparent that combinatorial logic 202 a incontrol logic 202 can be decoding an instruction code which was storedin instruction register 94 while controller 14 is causing the fetch ofthe following instruction from memory. In addition, sequential logic 202b is operative to control the operand read for a given instructionsimultaneously with the control of the execution of a previously fetchedinstruction. Accordingly, control logic 202 can be controllingmicrocomputer 10 in such a manner that portions of four differentinstruction codes may be carried out simultaneously. Such “pipelining”of the instruction codes will obviously reduce the time required toperform a given sequence of instructions.

[0152]FIG. 7b illustrates an example of how the pipeline is filled, andaccordingly how the pipeline operates for a typical instruction. In thefirst cycle of the system clock signal on line H3, instruction n isbeing fetched by controller 14, for example from one of memories 16, 18or 20. During the fetch cycle, however, program counter control logic204 has incremented the contents of program counter 92 to contain thememory location of the instruction code for instruction n+1. During thesecond cycle of the system clock signal on line CLKIN, the instructioncode for instruction n is being decoded by control logic 202. Alsoduring this second cycle, the contents of program counter 92 arepresented to address lines 34 a of program bus 34, and the instructioncode for instruction n+1 are fetched from program memory and loaded intoinstruction register 94.

[0153] During the third system clock cycle shown in FIG. 7b, sequentiallogic 202 b is effecting a read from memory (e.g., RAM 16) of a dataoperand necessary for instruction n via data bus 30. In addition, sincethe instruction code for instruction n+1 has been fetched, the thirdcycle shown in FIG. 7b illustrates that instruction n+1 is being decodedby combinatorial logic 202 a of control logic 202. Simultaneously withthe read cycle for instruction n, however, the fetch of the instructioncode for instruction n+2 is being done, assuming there is no bus ormemory conflict with the read cycle for instruction n. As describedabove, generally the data operand is read by CPU 12 via data bus 30while the instruction code is read via program bus 34; assuming thatboth reside in different memories 16, 18 or 20, or one residing inexternal memory, no bus conflict will occur.

[0154] During the fourth cycle of the system clock, instruction n willbe executed under the control of sequential logic 202 b in control logic202, the read operation for instruction n+1 will be effected bysequential logic 202 b, the instruction code for instruction n+2 will bedecoded, and the instruction code for instruction n+3 will be fetched.Accordingly, the pipeline for microcomputer 10 will be filled, and theperformance of a sequence of instructions will be optimal, subject tobus conflicts and to memory access conflicts which may, for certaininstruction combinations, cause a wait cycle for one of the operations.

[0155] Data lines 30 d of data bus 30 are received by controller 14, forcontrol of the program flow in other than incremental fashion, such as abranch instruction, requiring that program counter 92 be loaded by CPU12 or from memory. For example, in the event of an unconditional branch,the value of an operand contained in the instruction code, read frommemory, or read from a register in CPU 12 may contain the address of thememory location containing the next instruction code to be executed.Program counter control logic 204 will then receive the value presentedupon data lines 30 d, and load program counter 92 accordingly, so thatprogram control can pass to the desired location.

[0156] As illustrated in FIG. 7a, program counter control logic 204contains an adder 203 which receives the contents of program counter 92.Control logic 202 (preferably combinatorial logic 202 a therein),controls adder 203 so that generation of the contents of program counter92 for the next cycle may be performed in a variety of manners. Asexplained above, adder 203 may merely increment the prior contents ofprogram counter 92, to step through the instruction sequence. However,program counter control logic 204 further contains an register 205,which can receive a value from data lines 30 d of data bus 30. Programcounter control logic 204 can thus calculate the contents of programcounter 92 in various ways. For example, branching to a relative address(relative to program counter 92) may occur by way of loading register205 with a value presented on data lines 30 d of data bus 30; this valuecould then be added to the prior contents of program counter 92 togenerate a new value for program counter, 92. In addition, an absolutebranch may be effected by loading register 205 with the desired memoryaddress from data lines 30 d of data bus 30, and by control logic 202causing adder 203 to perform a “zero-add” with the contents of register205 for application to program counter.

[0157] It should be further noted that microcomputer 10 is capable ofperforming a “delayed” branch instruction, so that the branchinstruction is fetched three instructions before it is actually tooccur. The delayed branch instruction, when executed, loads register 205with the destination memory address of the branch as in a direct branch.However, control logic 202 will continue to increment the contents ofprogram counter 92 for the next three instructions following theexecution of delayed branch instruction. Upon the third instruction,adder 203 will apply the contents of register 205 to program counter 92,thereby effecting the branch while continuing to take advantage of thepipeline scheme. The pipeline may, of course, remain full after thebranch, as the destination location may continue to be incremented asbefore by adder 203.

[0158] Trap routines are supported in the preferred embodiment.Referring to FIG. 8a, trap instructions differ from branch instructionssuch that trap instructions entail indirect addressing to arrive at thetrap routine address while branch instructions entail relativeaddressing (which is less involved) to arrive at the branch address. Asa result, the throughput of a pipeline machine suffers from theindirection that occurs when arriving at the trap routine address,because to execute a trap sequence, no instructions are fetched for thenext three stages after a trap fetch. The address for the trap routinehas not been determined. Consequently, the pipeline is flushed whenevera trap instruction is executed. It should be noted that often wheninvoking a trap routine, it is advantageous to disable interrupts andfreeze cache memory. The inherent nature of trap routines in manycircumstances is incompatible with interrupts and cache memory andimprovements remedy such problems herein.

[0159] A delayed trap instruction (LAT) incorporated in the preferredembodiment remedies the undesirable effects of executing a trap routine.The LAT instruction is fetched three cycles before the trap instructionis executed. FIG. 8b shows the sequence of events in relation to systemcycle clock cycles of microcomputer 10. During system cycle clock cycle610 the LAT instruction is fetched from program memory. Decode cycle 620decodes the LAT instruction. Instructions are being fetched while theLAT instruction is executing thus maintaining the data flow from thepipeline. During the third system cycle clock cycle 630, the address ofthe first instruction of the trap routine is fetched from memory. Thememory can be any one of the memories discussed herein. Clock cycle 640saves the contents of the program counter (INS+4 representing the nextinstruction) to register PC+4 and loads the fetched trap address intoprogram counter. Thus, during the next system cycle clock cycle, thefirst instruction of the trap routine is fetched from the memory. Usingthe LAT instruction one system cycle clock cycle is used to initiate thetrap sequence, thus maintaining a constant data flow from the pipeline.The program counter value representing the next instruction is storedbefore loading the address of the first instruction of the trap routinethus ensuring program execution to resume at the point prior toexecuting the trap routine.

[0160]FIG. 8c shows a trap vector table which contains trap addresses(TA) that corresponds to locations for the first instruction of traproutines. The trap address is the sum of the trap vector table pointer(TVTP) and trap number N (TN). The summing of the TVTP and TN occursduring system cycle clock cycle 620. Control logic 202 decodes the LATinstruction fetched during system cycle clock cycle 610 and instructsadder 209 to sum operands TVTP and TN during system cycle clock cycle620.

[0161] For example, shown in FIG. 7a is trap address logic 208containing trap vector table pointer register 207, adder 209, programcounter+4 (PC+4) register 210. During system cycle clock cycle 620(after fetching the LAT instruction, control logic 202 decodes the LATinstruction. Trap number (TN) which specifies a particular trap routineis extracted from the LAT instruction by decoder 202 a and combined withtrap vector table pointer (TVTP) register 207 using adder 209. Theresult is a trap address (TA) specifying a location in memory thatcontains the trap vector which is the address of the first instructionfor the trap routine to be executed. The contents of the TVTP register207 can be altered thus offering even more flexibility in placing traproutines within the memory map of microcomputer 10. During the thirdcycle of the system clock after fetching the LAT instruction, the trapaddress is sent to memory via bus 30 a to access the trap vector that isreceived on bus 30 d. Access to memory is in accordance to above hereindescribed technique. On the fourth cycle of the system clock, thecurrent contents of program counter register 92 is transferred to PC+4register 210 and the trap vector is transferred to program counter 92.Thus, program counter register 92 contains the first instruction of thetrap routine, and the previous contents of the program counter register92 are stored in PC+4 register 210. When the trap routine is complete,the contents of PC+4 are transferred back to program counter register 92and program execution resumes at the point where the trap routineinterrupted. Advantageously, the trap routine interrupts programexecution using only one system cycle clock cycle and continues to takeadvantage of the pipelining scheme by keeping the pipeline full whileindirection of program execution is occurring.

[0162]FIG. 8d shows the flow chart of the steps used in the execution ofthe link and trap (LAT) instruction incorporated in the preferredembodiment of microcomputer 10 where if condition 171, if not satisfiedthe normal operation continues and if the condition 171, is satisfied,then interrupt and cache status 172, is saved by freezing the cache anddisabling the interrupt 173. Program counter of LAT plus Nth instruction174 is saved after which the program counter is loaded 175 with the trapvector containing the address of the first instruction of the traproutine. The LAT trap routine is then executed 176. After execution ofthe trap routine, the interrupt and cache status are restored wherebythe cache is no longer frozen (assuming it was not frozen before theLAT) and the interrupt vector is no longer disabled (assuming it was notdisabled before LAT) 177. Upon successful completion of these steps, thenormal operation continues as if the condition had never been satisfied178.

[0163] U.S. patent application Ser. No. 347,967 TI Docket 14145 givesmore details about the operation of conditional instructions which isincorporated herein by reference.

[0164] A repeat block delayed instruction (RPTBD) is incorporated in thepreferred embodiment. Advantages of the RPTBD instruction aresubstantially the same as the delayed branch and trap instructions:single system clock cycle execution and maintaining throughput by notflushing the pipeline. A distinct instruction called a repeat blockinstruction (RPTB) (without delay) is also implmented and allows a blockof instructions to be repeated a number of times without penalty forlooping; however, in RPTB the pipeline is flushed while the values ofrepeat start (RS) and repeat end (RE) registers contained in blockrepeat register 164 are being determined. It should be noted that therepeat count (RC) register (contained in block repeat register 164) isloaded before executing the RPTB instruction.

[0165] The repeat block delayed instruction (RPTBD) compared to RPTBadvantageously further fetches the next three instructions before therest of the RPTBD instruction is executed. FIG. 8e shows the sequence ofevents in relation to the system cycle clock cycles of microcomputer 10.During system cycle clock cycle 650, the RPTBD instruction is fetchedfrom program memory. Decode cycle 660 decodes the RPTBD instruction.Instructions are continually fetched while the RPTBD instruction iscycled through the pipeline. During the third system cycle clock cycle670, the decoded RPTBD instruction containing data that is used todetermine the repeat end (RE) address for the block of instructions issent to CPU 12. Clock cycle 680 causes CPU 12 to calculate the repeatend (RE) address. During clock cycle 690 the program counter (PC) isloaded into repeat start (RS) register 223 signaling the start of RPTBDinstruction; thus, the first instruction of the repeat block is fetchedfrom the memory. The block of instructions is repeated until the numberin the repeat count (RC) register is reached. Program executioncontinues. The pipeline is not flushed because the RPTBD instruction isfetched three system cycle clock cycles before executing the repeatblock delay (RPTBD) instruction. A constant data flow from the pipelineis maintained.

[0166] For example, shown in FIG. 8f is repeat block delay logic 220located in CPU 12. Contained within repeat block delay logic are repeatblock register 164. It should be noted that repeat count (RC) registeris loaded with a proper value. An RPTBD instruction is loaded intoinstruction register 94 and is decoded. Data and control signals aresent to CPU 12 along with program counter 92 (PC) where the data and PCare combined and stored in repeat end (RE) register 222. A signal online STORE from controller 14 places the contents of PC (92) to repeatstart (RS) register 223 via repeat start (RS). Each time the programcounter (PC) is incremented during the execution of the block ofinstructions, comparator 224 compares the value of the PC with RE todetermine whether PC equals the RE value. If not, then PC via programbus 34 a fetches the next instruction. If PC equals RE, then comparator224 checks if the zero flag is set by the repeat count (RC) register 221via signal ZERO signaling a zero count. If not, comparator 224decrements RC by 1 via signal DECR and a signal LOAD is sent to RSregister 223 loading the contents to PC register 92. Thus, the contentsof PC register 92 fetch the first instruction of the repeat block. Therepeat block is repeated until the zero flag is set signaling the numberof repetitions is complete. Then, PC is not loaded with the value in RSregister 223, and PC is incremented past the RE value. Program executioncontinues.

[0167]FIG. 8g. is a flow chart of the steps involved in implementing theRPTBD instruction. Operations commence with fetching of the RPTBDinstruction in start block 225. Then step 226 decodes the RPTBDinstruction. Next step 227 calculates repeat end (RE). Then step 228stores the value RE to the RE register and PC is stored to RS register.Step 229 begins execution of the block of instructions. Next step 230executes an instruction. Test step 231 determines whether PC equals RE.If not, branch to step 231 a to increment the PC and return to step 230to execute another instruction. Otherwise (if so) then operationsproceed to test step 232 to determine whether RC=0. If not, thenoperations branch to step 232 a decrementing RC by 1 and to step 232 bloading RS to PC before returning to execute the repeat block. Otherwise(if RC=0), then operations proceed to step 233 whereupon PC isincremented to RE plus 1 completing the repeat block delay instruction,and program execution continues.

[0168] Controller 14 further includes interrupt logic 250, which isconnected to a plurality of external terminals of microcomputer 10, tocontroller 14, and to various of the functions within microcomputer 10.Interrupt logic 250 serves the purpose of receiving interrupt signalspresented to microcomputer 10 on the RESET terminal and on terminalsINT0 through INT3, and receiving interrupt signals generated internallyto microcomputer 10 from various functions such as DMA coprocessor 22.An example of such an internal interrupt signal is shown in FIG. 10 byline 312, which is an interrupt signal from DMA coprocessor 22.Contained within CPU 12 as a control register is an interrupt enableregister, the contents of which specify whether each of the interruptsignals is enabled or disabled. Responsive to the receipt of an enabledinterrupt signal, either from terminals INT0 through INT3 or frominternal to microcomputer 10, and if controller 14 indicates that anaccess to an input/output memory location is not current, interruptlogic 250 will cause program counter 92 to be loaded with a memoryaddress corresponding to the particular interrupt signal (the “interruptvector”), and the execution of the program will continue from theinterrupt vector location forward. Responsive to an instruction codegenerally included in the interrupt handling routine called by theinterrupt vector, interrupt logic 250 generates interrupt acknowledgesignals on line INTA for external interrupts and, for example, on line314 for the internal interrupt signal for DMA controller 22. Controller14 causes the prior contents of program counter 92 to be stored in apredetermined memory location (generally called a “stack”), so that thelocation of the instruction code which would have been fetched next willbe reloaded after the interrupt has been serviced.

[0169] External memory devices connected to peripheral port 25, forexample, can be used to store the instruction codes for the programbeing executed by microcomputer 10. However, the access time of theexternal memory may be sufficiently slower than that of memories 16, 18and 20 so that controller 14 would have to wait a full system clockperiod after presenting the contents of program counter 92 on addresslines 34 a of program bus 34, before the instruction code would bepresented by the external memory onto data lines 34 d of program bus 34for receipt by instruction register 94. For any given instruction beingexecuted, often the next instruction code to be executed is located in amemory location in program memory which has an address close to theaddress of the given instruction. Such proximity in program memory ofthe next instruction code occurs especially often in digital signalprocessing applications, because of the repetitive nature of thecalculations therein. A instruction cache memory 36 as shown in FIG. 1is one way to take advantage of this repetitive nature.

[0170] Instruction cache 36, as described above relative to FIG. 1, isconnected to address lines 34 a and data lines 34 d of program bus 34.As shown in FIG. 9, instruction cache 36 contains 128-word memory 140which is organized into four 32-word segments 140 a, 140 b, 140 c and140 d. Instruction cache 36 further contains segment start registers 144a, 144 b, 144 c, and 144 d each of which stores a predetermined numberof the most significant bits of the addresses for the instruction codesstored in the respective segments 140 a, 140 b, 140 c, and 140 d. In thepreferred embodiment of the invention, since the address signal isthirty-two bits wide, and because each of segments 140 a, 140 b, 140 cand 140 d contain thirty-two (25) bits, the number of bits stored bysegment start registers 144 a, 144 b, 144 c and 144 d is twenty-seven.Associated with each of the thirty-two words stored in each of segments140 a, 140 b, 140 c and 140 d is a flag bit 142 for indicating thepresence of the instruction code within the corresponding word when set,and for indicating the absence of an instruction code therewithin whennot set. MSB comparator 146 is connected to address lines 34 a, forcomparing the twenty-seven most significant bits on address lines 34 awith the contents of the segment registers 144 a, 144 b, 144 c, and 144d. LSB decoder 148 is also connected to address lines 34 a and, as willbe discussed below, is for decoding the five least significant bits ofthe address lines 34 a. Input/output buffer 150 is connected betweendata lines 34 d and segments 140 a, 140 b, 140 c and 140 d, forcontrolling the output of instruction cache 36 to program bus 34.Instruction cache 36 further contains least-recently-used (LRU) stack152 which points to segment registers 144 a, 144 b, 144 c and 144 dcorresponding to the order in which they were most recently used.

[0171] In operation during a fetch cycle, where the memory address ofthe instruction code to be fetched does not reside in RAMs 16 or 18, orin ROM 20, but in external memory, MSB comparator 146 receives thetwenty-seven most significant bits of the address signal on addresslines 34 a of program bus 34, and compares them to the contents ofsegment registers 144 a, 144 b, 144 c and 144 d. In the event that amatch is found, LSB decoder 148 then decodes the five least significantbits of the address signal on address lines 34 a, to select the one offlag bits 142 corresponding to the one of thirty-two words within eithersegment 140 a, 140 b, 140 c or 140 d of the full address signal onaddress lines 34 a. If the corresponding flag bit 142 is set,input/output buffer 150 will present the contents of the correspondingword within the matched segment 140 a, 140 b, 140 c or 140 d onto datalines 34 d of program bus 34, and the access of the instruction codestored in instruction cache 36 is completed. In addition, the segmentregister 144 a, 144 b, 140 c or 140 d which was matched is pointed to bythe top of LRU stack 152, and the non-matching segment register 144 a,144 b, 144 c or 144 d is pointed to by the bottom of LRU stack 152. Thesegment pointed to by the bottom of LRU stack 152 is the least recentlyused one of segments 140 a, 140 b, 140 c and 140 d, and will be thesegment which is replaced in the event of a cache “miss”, as will beexplained below.

[0172] In some applications, some of the words in segments 140 a, 140 b,140 c and 140 d may not be loaded with instruction codes. Therefore, thepossibility arises that the twenty-seven most significant bits onaddress lines 34 a of program bus 34 will match the contents of one ofsegment registers 144 a, 144 b, 144 c and 144 d, but the word within thematching one of segments 140 a, 140 b, 140 c or 140 d corresponding tothe five least significant bits will not contain an instruction code. Inthis event, the flag bit 142 for the corresponding word is not set(i.e., contains a “0” logic state). This is a cache “miss”, and theinstruction code for the corresponding address must be read from theaddressed memory location in external memory; input/output buffer 150will load the instruction code from data lines 34 d of program bus 34into the corresponding word within the matched segment 140 a, 140 b, 140c or 140 d, with the corresponding flag bit 142 being set to a “1” logicstate. However, since the most significant bits matched one of segmentregisters 144 a, 144 b, 144 c and 144 d, the matching one of segmentregisters 144 a, 144 b, 144 c or 144 d will be pointed to by the top ofLRU stack 152, and the other one of segment registers 144 a, 144 b, 144c and 144 d will be pointed to by the bottom of LRU stack 152.

[0173] In the event that the nineteen most significant bits on addresslines 34 a of program bus 34 match the contents of neither one ofsegment registers 144 a, 144 b, 144 c or 144 d, a cache “miss” alsooccurs. In this event, flag bits 142 will be reset for all words in theone of segments 140 a, 140 b, 140 c or 140 d which corresponds to theleast recently used one of segments 140 a, 140 b, 140 c and 140 d, whichis pointed to by the bottom of LRU stack 152. The twenty-seven mostsignificant bits on address lines 34 a will then be stored into thesegment register 144 a, 144 b, 144 c or 144 d, for the least recentlyused one of segments 140 a, 140 b, 140 c or 140 d, and the instructioncode received from external memory on data lines 34 d will be loadedinto the corresponding one of the thirty two words in the “new” segmentcorresponding to the five least significant bits on address lines 34 a,and its flag bit 142 will be set to a “1” state. The one of segmentregisters 140 a, 140 b, 140 c or 140 d containing the newly loadedinstruction code will be pointed to by the top of LRU stack 152, withthe other segment register 140 a, 140 b, 140 c or 140 d pointed to bythe bottom of LRU stack 152.

[0174] A status register is contained in CPU 12 (not shown). Three bitsare contained within the status register which control the operation ofinstruction cache in a manner apart from that described above. A firstbit is the cache clear bit which, when set, resets all of flag bits 142,in effecting clearing the contents of instruction cache 36. A secondsuch control bit in the status register is the cache enable bit which,when set, enables operation of instruction cache 36; conversely, whenthe cache enable bit is not set, instruction cache 36 is disabled to theextent that it is in no way accessed, regardless of the address value onaddress lines 34 a. During such time that the cache enable bit is notset, the contents of segment registers 144 a, 144 b, 144 c and 144 d,flag bits 142, and the words within segments 140 a, 140 b, 140 c and 140d themselves, are not alterable. The third such bit within the statusregister is the cache freeze bit. When the cache freeze bit is set, onlyfetches from instruction cache 36 are allowed in the event of cache“hits”. In the event of a cache “miss”, however, no modification of flagbits 142, segment registers 144 a, 144 b, 144 c and 144 d, or LRU stack152 is performed; the instruction code fetch is merely performed fromexternal memory without affecting instruction cache 36.

[0175] Referring now to FIGS. 1 and 10, the construction and operationof DMA coprocessor 22 will be described. Direct memory access operationsare useful in moving blocks of stored data from one memory area toanother without intervention of the central processing unit (e.g., CPU12). For microcomputer 10 described herein, direct memory access is alsouseful for moving blocks of data between external memory and on-chipmemories 16 and 18. As shown in FIGS. 1 and 8, DMA communications ofdata occur on DMA bus 38 and receipt of control and source/destinationaddress information occur from peripheral bus 28.

[0176] It should be noted that peripheral bus 28 contains address lines28 a and data lines 28 d, which carry address information and data,respectively, in the same manner as data bus 30, program bus 34, and DMAbus 38 discussed heretofore. Referring back to FIG. 1, it is apparentthat address lines 28 a and data lines 28 d of peripheral bus 28 aredirectly connected, and therefore correspond, to the lines I/OAn andI/ODn, respectively, at the output of peripheral port 25. Accordingly,in order to present an address, or communicate data from or to,peripheral bus 28, the desired address is made to correspond to a valuewithin an address space serviced by peripheral port 25. Thememory-mapped registers within DMA coprocessor which are described beloware therefore within the memory address space 0001000A0_(h) through0001000FF_(h).

[0177] For purposes of clarity, the DMA coprocessor 22 shown in FIG. 10shows in detail only one DMA channel 21. It should be noted that fiveadditional DMA channels similar to DMA channel 21 are also incorporatedin DMA coprocessor 22 of the preferred embodiment. DMA channel 21 hassome registers that have a corresponding auxiliary register. Thoseauxiliary registers are used during split-mode operation that splits oneDMA channel to have separate source and destination paths that bound onehalf to the input FIFO and the other half to the output FIFO of acommunication port. The channel utilizing the non-auxiliary registers iscalled the primary, and the channel utilizing the auxiliary registersfor DMA transfers is called the auxiliary channel. Thus, the functionsof the auxiliary registers are similar to their non-auxiliarycounterparts. Auxiliary registers are used during split-mode operationand not used during unified mode. A detailed description of thesplit-mode operation will be described herein below.

[0178] DMA channel 21 contains control register 300, transfer counterregister 301, auxiliary count register 302, destination address register303, destination index register 304, source address register 305, sourceindex register 306, link pointer register 307 and auxiliary pointer 308,each of which are connected to address lines 28 a and data lines 28 d ofperipheral bus 28 and each of which are mapped into correspondingaddress locations of the memory address space of microcomputer 10. DMAchannel 21 further contains data register 309, which is connected todata lines 38 d of DMA bus 38. Address lines 38 a of DMA bus areconnected to destination address register 303, source address register305, link pointer register 307 and auxiliary pointer 308. Control logic310 is connected to control register 300 so that the contents of thebits therein will effect the control of DMA channel 21. Control logic310 generates a signal to transfer counter register 301 and auxiliarycount register 302 on line DECR and DECRX respectively and receives asignal from transfer counter 301 and auxiliary count register 302 online ZERO and ZEROX respectively. Control logic 310 provides a LOADsignal to destination address register 303 and source address register305; control logic 310 further provides signals to data register 309 onlines WRITE and STORE. To effect the desired memory read/writeoperations, control logic 310 generates read/write signals which areconnected to controller 14, so that controller 14 can generate suchcontrol signals to memories 16, 18 and 20, and to peripheral ports 24,25 and 26, as discussed above relative to memory access control bycontroller 14.

[0179] Control register 300 is a thirty-two bit addressable registerwhich is written to in order to configure DMA channel 21. The DMAchannel 21 is very flexible as evident from the multitude of differentcontrol variations configurable by setting the bits in the variouspositions of control register 300 to either a logic “1” or “0” state.Each of the thirty-two control bits in the control register 300 aredescribed in detail in Table 4. TABLE 4 DMA Channel Control Register BitPosition Bit Definition  0-1 DMA PRI DMA PRIority. Defines thearbitration rules to be used when a DMA channel and the CPU arerequesting the same resource. Affects unified mode and the primarychannel in split mode.  2-3 TRANSFER Defines the transfer mode used bythe DMA channel. MODE Affects unified mode and the primary channel insplit mode.  4-5 AUX Defines the transfer mode used by DMA channel.TRANSFER Affects the auxiliary channel in split mode MODE only.  6-7SYNCH Determines the mode of synchronization to be MODE used whenperforming data transfers. Affects unified mode and the primary channelin split mode. If a DMA channel is interrupt driven for both reads andwrites, and the interrupt for the write comes before the interrupt forthe read, the interrupt for the write is latched by the DMA channel.After the read is complete, the write will be able to be done.  8 AUTOIf AUTO INIT STATIC = 0, the link pointer is INIT incremented duringautoinitialization. If AUTO STATIC INIT STATIC = 1, the link pointer isnot incremented (it is static) during autoinitialization. Affectsunified mode and the primary channel in split mode.  9 AUX AUTO If AUTOINIT STATIC = 0, the link pointer is INIT incremented duringautoinitialization. If AUTO INIT STATIC STATIC = 1, the link point isnot incremented (it is static) during autoinitialization. Affects theauxiliary channel in split mode only. It is useful to keep the linkpointer constant when autoinitializing from the on-chip com ports ofother stream oriented devices such a FIFOs. 10 AUTOINIT If AUTO INITSYNCH = 0 then the interrupt enabled SYNCH by the DMA interrupt enableregister in the CPU used for DMA reads is ignored and theautointialization reads are not synchronized with any interrupt signals.If AUTO INIT SYNCH = 1, then the interrupt enabled by the DMA interruptenable register in the CPU used for DMA reads is also used tosynchronize the autoinitialization reads. Affects unified mode and theprimary channel in split mode. 11 AUX Affects split mode only. If AUXAUTOINIT SYNCH = 0 AUTOINIT then the interrupt enabled by the DMAinterrupt SYNCH enable register in the CPU used for DMA reads is ignoredand the autoinitialization reads are not synchronized with any interruptsignals. If AUTOINIT SYNCH = 1, then the interrupt enabled by the DMAinterrupt enable register in the CPU used for DMA reads is also used tosynchronize the autoinitialization reads. Affects the auxiliary channelin split mode only. 12 READ BIT If READ BIT REV = 0, then the sourceaddress is REV modified using 32-bit linear addressing. If READ BIT REV= 1, then the source address is modified using 24-bit bit-reversedaddressing. Affects unified mode and the primary channel in split mode.13 WRITE BIT If WRITE BIT REV = 0, then the source address is REVmodified using 32-bit linear addressing. If WRITE BIT REV = 1, then thesource address is modified using 24-bit bit-reversed addressing. Affectsunified mode and the auxiliary channel in split mode. 14 SPLIT Controlsthe DMA mode of operation. If SPLIT MODE MODE = 0, then DMA transfersare memory to memory. This is referred to as unified mode. If SPLIT MODE= 1, the DMA is split into two channels allowing a single DMA channel toperform memory to communication port and communication port to memorytransfers. May be modified by autoinitialization in unified mode or byautoinitialization by the auxiliary channel in split mode. 15-17 COMDefines a communication port to be used for DMA PORT transfers. If SPLITMODE = 0, then COM PORT has no affect on the operation of the DMAchannel. If SPLIT MODE = 1, then COM PORT defines which of the sixcommunication ports to use with the DMA channel. May be modified byautoinitialization in unified mode or by autoinitialization by theauxiliary channel in split mode. 18 TCC Transfer counter interruptcontrol. If TCC = 1, a DMA channel interrupt pulse is sent to the CPUafter the transfer counter makes a transition to zero and the write ofthe last transfer is complete. If TCC = 0, a DMA channel interrupt pulseis not sent to the CPU when the transfer counter makes a transition tozero. Affects unified mode and the primary channel in split mode. DMAchannel interrupts to the CPU are edge triggered. 19 AUX TCC Auxiliarytransfer counter interrupt control. If AUX TCC = 1, a DMA channelinterrupt pulse is sent to the CPU after the auxiliary transfer countermakes a transition to zero and the write of the last transfer iscomplete. If AUX TCC = 0, a DMA channel interrupt pulse is not sent tothe CPU when the auxiliary transfer counter makes a transition to zero.Affects the auxiliary channel in split mode only. The DMA channelinterrupts pulse is sent if TCC = 1 and the transfer counter is 0 andthe write of the last transfer is complete or if AUX TCC = 1 and thetransfer counter is 0 and the write of the last transfer is complete. 20TCINT Transfer counter interrupt flag. This flag is set FLAG to 1whenever a DMA channel interrupt pulse is sent to the CPU due to atransfer counter transition to zero and the write of the last transfercompleting. Whenever the DMA control register is read this flag iscleared unless the flag is being set by the DMA in the same cycle as theread. In this case TCINT is not cleared. Affected by unified mode andthe primary channel in split mode. 21 AUX Auxiliary transfer counterinterrupt flag. This TCINT flag is set to 1 whenever a DMA channelinterrupt FLAG pulse is sent to the CPU due to an auxiliary transfercounter transition to zero and the write of the last transfercompleting. Whenever the DMA control register is read, this flag iscleared unless the flag is being set by the DMA in the same cycle as theread. In this case AUX TCINT is not cleared. Affected by the auxiliarychannel in split mode only. Since only one DMA channel interrupt isavailable for a DMA channel, you can determine what event set theinterrupt by examining TCINT FLAG and AUX TCINT FLAG. 22-23 START Startsand stops the DMA channel in several different ways. Affects unifiedmode and the primary channel in split mode. 24-25 AUX Starts and stopsthe DMA channel in several START different ways. Affects the auxiliarychannel in split mode only. The START and AUX START bits, if used tohold a channel in the middle of an autoinitialization sequence, willhold the autoinitialization sequence. If the START or AUX START bits arebeing modified by the DMA channel (for example, to force a halt code of10 on a transfer counter terminated block transfer) and a write is beingperformed by an external source to the DMA channel control register, theinternal modification of the START or AUX START bits by the DMA channelhas priority. 26-27 STATUS Indicates the status of the DMA channel.Updated in unified mode and by the primary channel in split mode.Updates are done every cycle. 28-29 AUX Indicates the status of the DMAchannel. STATUS Updated by the auxiliary channel in split mode only. Insplit-mode, updates are done every cycle. The STATUS and AUX STATUS bitsare used to determine the current status of the DMA channels and todetermine if the DMA channel has halted or been reset after writing tothe START or AUX START bits. 30-31 Reserved Read as 0.

[0180] Source address generator 320 calculates a source address byadding the contents of source address register 305 with the contents ofthe corresponding source index register 306 with the result stored insource address register 305 whereby source address register 305 containsthe source address for the data to be transferred from. Likewise,destination address generator 330 calculates a destination address byadding the contents of destination address register 303 with thecontents of the corresponding destination index register 304 with theresults stored in destination address register 303 whereby destinationaddress register 303 contains the destination address for the data to betransferred to. Depending upon the logic state of bit 12 (READ BITREV)and of bit 13 (WRITE BITREV) in control register 300, the source anddestination address generators, respectively can perform either linear(normal addition) or bit reversed (reverse carry propagation) addition.The source index register 306 and the destination source index register304 are signed values thus when combined respectively with the sourceaddress register 305 and destination address register 306, addresses maybe incremented or decremented for DMA accesses. Data register 309 is atemporary register for buffering data from and to data lines 38 d of DMAbus 38; the value of data line 38 d is loaded into data register 309responsive to a signal on line WRITE, and the contents of data register309 are presented to data line 38 d responsive to a signal on lineSTORE.

[0181] Control logic 310 is further connected to controller 14, so thatthe operation of DMA channel 21 is controlled consistently with theoperation of the rest of microcomputer 10. As will be evident below, theDMA can be interrupt synchronized, so that the receipt or transmissionof data from external sources can be done without conflict among CPU 12,DMA coprocessor 22, and the external source. START bit 300 a of controlregister 300 enables and disables the operation of DMA channel 21, whileAUX START bit 300 b of control register 300 enables and disables thesplit-mode operation of DMA coprocessor. A logic “1” state in thecorresponding bit position enables operation and a logic “0” statedisables operation. TCC bit 300 c of control register 300 controls readylogic 310 so that, when TCC bit 300 c is set to a “1” state, the DMAtransfer is terminated upon transfer counter register 301 reaching zero.AUX TCC bit 300 d of control register 300 controls ready logic 310 thesame way as the TCC bit 300 c except that the DMA transfer is terminatedupon auxiliary count register 302 reaching zero. Sync bits 300 e and 300f allow configuration of the synchronization of DMA channel 21 witheither the source or destination of the transferred data. TCINT bit 300g, when set to a “1” state, creates an internal interrupt when thecontents of transfer counter register 301 reach zero. Control logic 310is connected to controller 14 to generate an internal interrupt signalon line 312, and to respond to the interrupt acknowledge signal frominterrupt logic 250 on line 314. AUX TCINT bit 300 h functions likeTCINT except creates an internal interrupt when the contents of theauxiliary count register 302 reach zero. Interrupt lines responding are312 a and 314 a for sending an interrupt and receiving an acknowledgesignal to and from interrupt logic 250, respectively.

[0182] The DMA operation performed under the control of DMA controller22 can be interrupt-driven in conjunction with controller 14, so thatthe operation can be externally controlled. As described above relativeto controller 14, internally generated interrupts are received andhandled by interrupt logic 250 in controller 14. Control logic 310further generates an interrupt request signal to controller 14 on line313, and receives an interrupt active signal therefrom on line 315. Theinterrupt request signal on line 313 indicates that DMA controller iswaiting for a DMA-related interrupt generated by an external device, andthe interrupt active signal on line 315 indicates that such an interrupthas been received by controller 14 and is to be serviced.Synchronization is controlled by control logic 310 generating theinterrupt request signal at predetermined points in the DMA transfercycle and waiting for the interrupt active signal before proceeding; theselection of the synchronization points is made by loading bits 300 eand 300 f of control register 300. Table 5 lists the synchronizationmodes performable by DMA coprocessor 22. TABLE 5 Bits 300e/f Interruptsynchronization 00 No interrupt synchronization. 01 Sourcesynchronization; DMA read on interrupt, write when available 10Destination synchronization; DMA read when available; write on interrupt11 Source and destination sync; DMA read on interrupt; write on nextinterrupt

[0183] In operation, the transfer counter register 301, destinationaddress register 303, and source address register 305 of DMA channel 21are first loaded with the initial conditions as desired. Each of theseregisters 301, 303 and 305 are addressable by address lines 28 a ofperipheral bus 28 using a normal memory write instruction executed bymicrocomputer 10; implicit in FIG. 10 for each of the registers 301, 303and 305 is decoding logic for controlling the loading of said registers301, 303 and 305 when addressed. Control register 300 is also loaded byaddressing its memory location, thereby configuring DMA channel 21 asdesired. Control logic 310 is responsive to START bit 300 a being set toa “1” state, enabling the operation of DMA controller 22.

[0184] By way of example, control register 300 of DMA channel 21 isloaded with the necessary data so that the selected synchronization modewill be destination synchronization. Thus, control logic 310 will firstdisable control logic 310 from accepting internal interrupt signals frominterrupt logic 250. The source address register 305 of DMA channel 21is loaded with the address of the source memory. The destination addressregister 303 (of DMA channel 21) is loaded with the address of thedestination memory, and transfer counter 301 is loaded with the numberof words to be transferred. According to the example, control register300 is configured for sequential transfer of data for both the sourceand the destination data thus, source index register 306 and destinationindex register 304 are set to 1. The START bit of control register 300initiates the DMA transfer.

[0185] Control logic 310 sends signals CALS and CALD to source addressand destination address generators to calculate source and destinationaddresses for data and to store the addresses in the source addressregister 305 and destination address register 303. Upon a LOAD pulsefrom control logic 310 to source address register 305, the contents ofsource address register 305 will be placed on address lines 38 a of DMAbus 38. The addressed memory location (either in external memory viaperipheral port 24 or 26, or in memories 16, 18 or 20) will be read.Control logic 310 will pulse the STORE line connected to data register309, to load the value on data lines 38 d of DMA bus 38 into dataregister 309. After the read operation, control logic 310 pulses CALSand the contents of source index register 306 is added to the contentsof source address register 305 with the result written back to sourceaddress register 305. Also during this time, DECR is pulsed by controllogic 310 decrementing the count of the transfer counter register 302 byone.

[0186] According to the destination synchronization mode selected bycontrol register 300, control logic 310 will now generate an interruptrequest signal on line 313 to interrupt logic 250. Responsive tocontroller 14 receiving an enabled interrupt directed to DMA, such anevent communicated to DMA controller by the interrupt active signal online 315, control logic 310 will begin the DMA write operation.Accordingly, the contents of destination register 303 will be presentedupon address lines 38 a of DMA bus 38 responsive to control logic 310presenting the LOAD signal to destination address register 303. Controllogic 310 also pulses the WRITE line connected to data register 309, sothat the contents of data register 309 are presented upon data lines 38d of DMA bus 38. The addressed memory location is accessed as describedbefore, with controller 14 providing the necessary write control signalsto effect the writing of the contents of data register 308 into theaddressed location.

[0187] After completing the write, the contents of destination addressregister 303 are added to the contents of destination index register 304by control logic 310 via line CALD with the result written back todestination address register 303. It should be noted that separatesource and index registers allows for variable step sizes or continualreads and/or writes from/to a fixed location.

[0188] DMA transfers continue until transfer counter 301 goes to zeroand the write of the last transfer is complete. The DMA channel 21 hasthe ability to reinitialize another set of source and destinationaddresses to perform another DMA transfer without intervention by CPU12. When the TRANSFER MODE bits are set to 10 (refer to Table 6) incontrol register 300, the link pointer register 307 initializes theregisters which control the operation of the DMA channel. The linkpointer register 307 contains the address of a structure in memory for anew control register and other pertinent values which are loaded intothe registers of DMA channel 21 such as: source address register, sourceindex register, destination address register, destination indexregister, link pointer register and auxiliary registers if usingsplit-mode operation. It should be noted that autoinitialization of theDMA channel occurs without intervention by CPU 12. TABLE 6 The effect ofthe TRANSFER MODE field. TRANSFER MODE Effect 00 Transfers are notterminated by the transfer counter and no autoinitialization isperformed. TCINT can still be used to cause an interrupt when thetransfer counter makes a transition to zero. The DMA channel continuesto run. 01 Transfers are terminated by the transfer counter. Noautoinitialization is performed. A halt code of 10 is placed in theSTART field. 10 Autoinitialization is performed when the transfercounter goes to zero without waiting for CPU intervention. 11 The DMAchannel is autoinitialized when the CPU restarts the DMA using the DMAregister in the CPU. When the transfer counter goes to zero, operationis halted until the CPU starts the DMA using the DMA start field in theCPU DMA register and a halt code of 10 is placed in the start field bythe DMA.

[0189] In the preferred embodiment, any one of the six DMA channels canoperate in conjunction with any one of the six communication ports 50-55using a special DMA transfer mode called split-mode operation as shownin FIG. 11. Split-mode operation separates one DMA channel into twoconcurrent operations: one dedicated to receiving data from acommunication port and writing the data to a location in the memory map,and one dedicated to reading data from a location in the memory map andwriting the data to a communication port. The control register 300 has aSPLIT MODE bit that can be set to indicate split mode operation and COMPORT bits to select which communication port is used for split-modeoperation (refer to Table 4 register bit 14). During split-modeoperation, the DMA channel dedicated to reading data operatesindependently from the DMA channel dedicated to writing data. Thus, anauxiliary count register and an auxiliary pointer register for the DMAchannel are dedicated to writing data (auxiliary channel) andrespectively correspond to transfer count registers and link pointerregisters used for the DMA channel dedicated to reading data (primarychannel). It should be noted that there are six auxiliary countregisters and six auxiliary pointer registers—one for each DMA channel.

[0190] In the preferred embodiment, as many as six DMA channels areaccessing the DMA bus 38 at the same time (and sometimes as much astwelve DMA channels are accessing the DMA bus 38 simultaneously whichoccurs when operating in split-mode when all six DMA channels areconfigured to operate in conjunction with all six communication ports).Thus, contained within coprocessor 22 is a priority controller (notshown) that implements a rotating priority scheme. The last DMA channelto get service becomes the lowest priority DMA channel. The other DMAchannels rotate through a priority list with the next lower DMA channelfrom the DMA channel serviced having the highest priority on thefollowing request. The priority rotates every time the most recentpriority-granted channel completes its access. FIG. 12a illustrates therotation of priority across several DMA coprocessor accesses. Anasterisk indicates the DMA channel requesting service. When a DMAchannel is running in split-mode the arbitration between channels issimilar to the just discussed unified DMA channel. The split-mode DMAchannel participates in the rotating priority scheme having the samepriority as if it were a unified DMA channel.

[0191] The split-mode DMA channel complicates the process by having aprimary channel transfer and an auxiliary channel transfer. Sinceprimary and auxiliary channels can run independent of each other, thetwo subchannels compete for priority within the host DMA channel whilethe host DMA channel competes with the other unified DMA channels. FIG.12b illustrates this priority mechanism that is controlled by thepriority controller (not shown) contained within coprocessor 22. In thiscase assume that only channel two is running in split-mode. The primarychannel is designated as 2pri and the auxiliary channel as 2aux. Again,an asterisk (*) indicates the DMA channel requesting service. The firstservice is a request by 2pri. After 2pri is serviced, channel 2 is movedto the lowest priority level, and 2pri is moved to a lower prioritylevel below 2aux within channel 2. It should be noted that the twosubchannels (2pri and 2aux) are prioritized within themselves. Channel 4having a higher priority than channel 2 is serviced next. On the thirdservice 2pri is serviced. On the 4th service, with 2aux and 2pri bothrequesting, 2aux is serviced first, channel two becomes the lowestpriority channel and 2aux becomes lower priority than 2pri. On the 5thservice channel 3 is serviced. If no higher priority services arerequested, 2pri would be serviced next.

[0192] As is evident from this description, DMA coprocessor 22 is thusoperable to transfer the contents of memory locations from memorybeginning with the initial source address, to memory locations beginningwith the destination address. After completion of the transfers, the DMAcoprocessor can autoinitialize itself by fetching from memory thenecessary information to perform another DMA transfer sequence. Thisoperation as described herein does not require the intervention of CPU12 and, since DMA bus 38 provides a separate address and data path forDMA purposes, can allow such a DMA operation to take placesimultaneously with program and data accesses in the normal operation ofmicrocomputer 10. DMA operations can occur essentially transparent tothe operation of microcomputer 10, greatly enhancing its performance.

[0193] Referring now to FIG. 13, the operation of peripheral bus 28, andits communication with various peripheral functions will be explained.By way of example, timer 40 and 41, analysis module 42 and communicationports 50-55 are the peripheral functions connected to microcomputer 10described herein. These three functions provide certain communicationand/or data processing functions depending upon their construction, buteach of said peripheral functions communicate with peripheral bus 28,and thereby with the rest of microcomputer 10, in the same manner. Eachof peripherals 40, 41, 42 and 50-55 are configured and operated bymicrocomputer 10 by using memory mapped registers, addressable byperipheral bus 28, in the manner described below. It should be recalledthat, as in the case of the memory-mapped registers contained within DMAcontroller 22, the memory-mapped registers contained in the peripheralfunctions described below reside in the input/output address space000100000_(h) through 0001000FF_(h). The preferred embodiment ofmicrocomputer 10 consists of two timers; each timer operatesindependently of the other. Thus only timer 40 will be described indetail herein below because timer 41 has similar functions as timer 40and also that the registers of timer 41 corresponds to those registersof timer 40. For example, timer logic 400 corresponds with timer logic410, control register 402 corresponds with control register 412, periodregister 404 with period register 414, counter register 406 with counterregister 416, and TCLK1 with TCLK2.

[0194] Timer 40 performs the function of measuring predetermined timeperiods for external control, or for internal control of microcomputer10. Timer 40 contains timer logic 400, connected to address lines 28 aof peripheral bus 28; timer logic 400 is operable to evaluate theaddress signal on lines 28 a of peripheral bus 28, and to allow accessto the various memory-mapped registers within timer 40 accordingly. Eachof the registers within timer 40 (described below) are addressable by anaddress signal within the single address space of microcomputer 10. Thememory-mapped registers within timer 40 include a control register 402which contains certain control information necessary to control theoperation of timer 40, such as an enable/disable bit, and such aswhether timer 40 is controlled by the system clock of microcomputer 10to provide an external output, or is controlled by external clock pulsesto provide an internal signal. Timer 40 further contains addressableperiod register 404, which is loaded from data lines 28 d with the valuespecifying the period of time to be measured by timer 40. Counterregister 406 is also contained within timer 40, and which is incrementedby each pulse of either the system clock or a clock pulse received online TCLK1 externally. In operation, timer logic 400 is responsive tothe contents of counter register 406 equaling the contents of periodregister 404, at which time timer logic 400 generates an internalinterrupt signal to controller 14 if control register 402 has so chosen;if control register 402 has selected external output, timer logic 400generates a pulse on line TCLK1 when the contents of counter register406 equal the contents of period register 404.

[0195] Analysis module 42 is to provide improved emulation, simulationand testability architectures and methods which provide visibility andcontrol without physical probing or special test fixtures. One suchanalysis module is described in co-pending and co-assigned U.S.application Ser. No. 388,270 filed Jul. 31, 1989 (TI Docket 14141). Somefeatures supported by analysis module 42 are specifically discussedbelow. A trace feature enables tracing of the start address of theprevious program block, end address of the previous program block, andstart address of current block, with current program counter (PC) equalto the end address of the current block. This facilitates a reverseassembly of where the program has come from and allows a trace backfeature to be implemented in combination with the PC context switchbreakpoints.

[0196] Sufficient machine state information is implemented to retrievethe last program counter executed and to determine if any repeat single,repeat block, or delayed instruction is active. The machine stateinformation also recalls the machine states required to restartexecution from these cases in any of the CPU stop modes. A stop mayoccur within repeats. Single stepping of the code results in a singleinstruction being executed. This means only one instruction within arepeat single or block loop is executed.

[0197] Faster downloads are supported by implementing short scan pathsin the CPU. Short scan paths are accomplished using a partial scan ofthe CPU and a HLT applied to the CPU MPSD test port.

[0198] The behavior of the memory interface differs during emulationmode and simulation mode. In emulation mode, control of the memoryinterface allows normal operation of the interface to continue while theCPU domain test port is in a scan, pause or halt state. Control signalsremain inactive in a high impedance state while Hold functions continueto operate. Memory control signals are to be asserted in the inactivestate with correct timing when the system domain test port is in a pausestate or scan state. Control signals cannot toggle or glitch because ofMPSD test port code changes. In simulation mode, control of theinterfaces are such that the control signals are asserted in the machinestate with correct timing when the system domain test port is in a SDAT,SCTRL, or PAUS state. Memory interface logic (hold_, holda) do notfunction unless the system test port is in the CNTRL or FUNC state andsuspend is not active. Simulation mode slaves system domain clock to theCPU domain execution clock, MPSD codes FUNC, CNTRL, or HLT applied.

[0199] Peripherals have independence of operation when the chip isoperating in the emulation mode. In simulation mode their operation istightly coupled to the CPU domain. The peripherals may have from one tothree of the following operating modes when the chip is operating in theemulation mode: free, soft and hard. When a peripheral, such as a timer,is allowed to have up to three modes, the specific mode is madeavailable to the user through two dedicated bits in a peripheral controlregister. These bits do not affect the operation of the peripheralsprovided the system test port has FUNC applied.

[0200] Peripheral free mode means the peripheral continues to operatenormally regardless of the CPU domain state or the state of SUSPENDprovided the system test port has CNTRL applied.

[0201] Peripheral soft allows the coupling of a CPU or system assertionof SUSPEND i.e., CPU domain halted, with the subsequent halt of theperipheral. With peripheral soft, the peripheral continues to operatenormally after SUSPEND is asserted until a predefined condition withinthe peripheral occurs. When this event occurs the peripheral haltsexecution. The peripheral resumes execution when SUSPEND becomesinactive and the system test port has CNTRL applied.

[0202] Peripheral hard allows the direct coupling of a CPU or systemassertion of SUSPEND i.e., CPU domain halted, with an immediate halt ofthe peripheral. With peripheral hard, the peripheral appears as if it istightly coupled to the CPU domain, halting immediately when SUSPEND isasserted. This assumes the system test port has CNTRL applied. When thisoccurs the peripheral halts execution. The peripheral resumes executionwhen SUSPEND becomes inactive and the system test port has CNTRLapplied. This mode makes the peripheral execute the same number ofclocks of user code as the CPU domain executes.

[0203] Peripheral operation in the Simulation Mode is controlled by theSystem test port, suspend, and the CPU test port. The peripheral clocksmay run when, the CPU domain and the System domain test ports have CNTRLapplied and the CPU clocks are on, and SUSPEND is not active.

[0204] Five instructions are used in the emulation architecture tomanage analysis and emulation requested stops. These instructions are:

[0205] a) ESTOP—Emulation Stop

[0206] b) ETRAP—Emulation Trap

[0207] c) ASTOP—Analysis Stop

[0208] d) ATRAP—Analysis Trap

[0209] e) ERET—Emulation Return

[0210] These instructions provide the mechanism where by Emulation SWand Analysis generated execution halt requests are processed inconjunction with TRAPEN, allowing the determination of the cause of thetrap or stop. The emulation return instruction is separate from a normalreturn as the two trap instructions set a suspend bit (TRPSUSP) and theemulation return instruction resets this bit. The emulation and analysistraps and returns are identical normal traps and returns with theexception of managing TRPSUSP.

[0211] Emulation stop (ESTOP) is placed in memory by the Emulation SW orimbedded in the functional code by the user or compiler. It causes astop with the pipeline empty regardless of the CPU stop mode. Executionof this instruction causes an associated emulation interrupt. An ESTOPstatus is set in the CPU and instruction fetches to fill the pipeline donot occur until this flag is reset by Emulation SW. The pipeline may beloaded with a non empty state while this flag is set and the pipelineexecutes to the empty state when CPU test port codes HLT, or CNTRL areapplied. FUNC causes this flag to be reset.

[0212] Emulation trap (ETRAP) is placed in memory by the Emulation SW orimbedded in the functional code by the user or compiler. If TRAPEN istrue to the CPU, this instruction causes a trap, sets TRPSUSP, andgenerates an associated emulation interrupt. The pipeline is emptybehind it. When TRAPEN is not true to the CPU, the instruction isexecuted, the emulation interrupt generated, but TRPSUSP is not set andthe trap is not taken. In both cases an ETRAP status flag is set in theanalysis domain. This bit is resetable by scan.

[0213] Analysis stop (ASTOP) is jammed into the instruction pipeline atthe earliest time when the analysis requests a stop condition and TRAPENis false to the CPU. ASTOP has the same characteristics as ESTOP exceptit has its own status flag which has the same characteristics as theESTOP status flag.

[0214] Analysis trap (ATRAP) is jammed into the instruction pipeline atthe earliest time when the analysis requests a stop condition and TRAPENis true to the CPU. This instruction causes a trap, sets TRPSUSP, andgenerates an associated emulation interrupt. The pipeline is emptybehind it. An ATRAP status flag is set in the analysis domain. This bitis resetable by scan.

[0215] Emulation return (ERET) resets TRPSUSP and otherwise acts like anormal return instruction.

[0216] Message status register contains status information forcontrolling the transfer of data and commands to and from the device.These status bits are readable and some are writable. The status bitsare: Bit Number a) WBFUL write buffer full 4 b) RBFUL read buffer full 3c) CMD Command transfer 2 d) GXFER Good transfer 1 e) MACK Messageacknowledge 0

[0217] ABUSACT indicates that the analysis test port has HLT, CNTRL, orFUNC applied.

[0218] The WBFUL status bit is in the analysis domain. It is set via adevice write to the message register when the RBFUL flag is not true andABUSACT is true. This bit is reset via scan.

[0219] The RBFUL status bit is in the analysis domain. It is set viascan and reset via a read to the CMD address of the MSG register whenCMD is set or a read to the data address of the MSG register when CMD isnot set provided ABUSACT is true in both read instances.

[0220] The CMD status bit is in the analysis domain. It is set via adevice write to the command message register address, when the RBFULflag is not true and ABUSACT is true. It is reset when a write occurs tothe data message register address and the RBFUL flag is not true andABUSACT is true. The CMD bit scanable and settable to either logicalvalue.

[0221] The GXFER status bit is in the system domain. It is set when:

[0222] a) A read to the command message address occurs, CMD is true,RDRUL is true, and ABUSACT is true;

[0223] b) A read to the data message address occurs, CMD is false, RDFULis true, and ABUSACT is true;

[0224] c) A write to a data or command message address occurs, RBFUL isfalse, and ABUSACT is true.

[0225] The GXFER bit is reset on system FUNC or a read or write to amessage register address without a, b, or c being true.

[0226] Message acknowledge (MACK) is a writable and readable bitconnected to the emulation control block and resides in the systemdomain. The MACK bit is selectable to appear on EMUO pin and it servesas the handshaking for message transfers.

[0227] The message passing register and message register status bits inthe analysis domain are on a short analysis scan path. The shortanalysis scan path is the first path out of the analysis domain. Themessage register is the first output followed by the message statusregister bits. It should be noted that both the message passing registerand the message register status bits are transferred out in an orderstarting with the least significant bit (LSB).

[0228] In one variation of the preferred embodiment anothermicrocomputer similar to the microcomputer 10 herein-described isdirectly coupled to microcomputer 10 via one or more or all of thecommunication ports 50-55. FIG. 14 illustrates the connection betweentwo microcomputers 10 where one communication port is connected to theother communication port via control and data signals 585. When twomicrocomputers 10 are coupled via the communication ports, the input andoutput FIFO registers are combined and thus the number of FIFO registersis doubled. The buffering capacity of the combined communication port isthe sum of each individual communication port. The two coupledmicrocomputers 10 have provisions for pin for pin compatibility enablingthe two microcomputers to directly connect via any one of the sixcommunication ports 50-55. It should be noted that with pin for pincompatibility between microcomputers 10, the microcomputers are readilyconnected using the communication ports.

[0229] Referring now to FIG. 15, the operation of communication ports50-55 will be explained. FIG. 15 shows the internal architecture ofcommunication port 50, which for purpose of this discussion isfunctionally identical to the other five communication ports. In orderfor data transfer to occur with communication ports 50-55, the desiredaddress presented via peripheral bus 28 is made to correspond to a valuewithin the memory address space of microcomputer 10 that corresponds toan address serviced by peripheral port 25. The memory-mapped registerswithin communication ports 50-55 which are described below are withinthe memory address space 000100040_(h) through 00010009F_(h).

[0230] Communication port 50 contains port control register 510, inputfirst-in-first-out (FIFO) 540, and output FIFO 550, each of which areconnected to address lines 28 a and data lines 28 d of peripheral bus28, and each of which are mapped into corresponding address locations ofthe memory address space of microcomputer 10. The input FIFO 540 and theoutput FIFO 550 each have a corresponding FIFO control that is attachedto the respective FIFO unit. Communication port 50 further contains aninterface port 530. A port arbitration unit 520 provides handshakingsignals to an external device for effectuating data transfers from or tointerface port 530. The port control register 510 contain control andstatus bits for the communication channel. Port logic unit 560 controlthe interfacing between to the port arbitration unit 520, input andoutput FIFO units 540 and 550 and the port control register 510. Theport logic unit 560 also provides interrupts to the interrupt logic 250.

[0231] In order to transmit data, a qualifying token is used for dataflow control of the connected communication port. For example, a signalon line BUSRQ from port logic unit 560 to port arbitration unit 520signals the port arbitration unit 520 to arbitrate for control over theeight-bit communication channel data bus CD(7-0) from external requestto use the data bus. It should be noted that arbitrating is notnecessary if port arbitration 520 has possession of the qualifyingtoken. The qualifying token is used to determine whether communicationport 50 or an external port has control of the communication channeldata bus. The qualifying token is passed between the port arbitrationunit 520 of communication port 50 and the external port. The portarbitration unit 520 is a state machine having four defined states.Table 7 defines these states. TABLE 7 Definition of PAU states PAU STATEPAU Status 00 PAU has token (PORT DIR = 0) and channel not in use OUTPUTLEVEL = 0). 01 PAU does not have token (PORT DIR = 1) and token notrequested by PAU (OUTPUT LEVEL = 0). 10 PAU has token (PORT DIR = 0),channel in use (OUTPUT LEVEL not = 0). 11 PAU does not have token (PORTDIR = 1), token requested by PAU (OUTPUT LEVEL not = 0).

[0232] These four states aid in determining whether or not the token canbe passed to the requesting communication port and are defined in termsof status information that is available in the port control register510. FIG. 16 shows the state diagram and controlling equations for thestate transitions of the port arbitration unit 520.

[0233] For this example, communication port 50 is connected to anexternal port similarly equipped as shown in FIG. 14. Operation beginswith port arbitration unit 520 of communication port 50 in state 00(with token, channel not in use) connected to a port arbitration unit ofthe external port in state 01 (without token, token not requested).Communication port 50 is instructed to transmit data to the externalport. Port arbitration unit 520 receives a request from port logic unit560 on line BUSRQ to use the communication port data bus. Portarbitration unit 520 allows the output FIFO to transmit one wordimmediately, since it has the token, and enters state 10 (with token,channel in use). After the output FIFO transmits that one word, portlogic unit 560 removes the bus request (BUSRQ=0) and then portarbitration unit 520 returns to state 00.

[0234] Next port arbitration unit of external port receives a requestfrom its port logic unit to use the bus (BUSRQ), port arbitration unitof the external port requests the token from port arbitration unit 520over the CREQ_ line, state 11, (without token, token requested). Thisrequest is seen inside state machine 525 of port arbitration unit 520via the state variable TOKRQ. When port arbitration unit 520 is in state00 (with token, channel not in use) the token is transferred using theCACK_ line. When port arbitration unit of the external port receives thebus, this is signalled internally within the port arbitration by a busacknowledge signal (BUSACK). As a result of the token transfer portarbitration unit 520 enters state 01 (without token, token notrequested) and port arbitration unit of the external port enters state10 (with token channel in use). It should be noted that communicationport 50 is not limited to communications with external ports similarlyequipped but can interface to external ports that provide properhandshaking signals.

[0235] Since port arbitration unit 520 always returns to state 00 aftertransmitting a single word, tokens may be passed back and forth allowingfor a word to be transmitted from communication port 50 and the externalport and then from the external port to communication port 50. Thisprovides an inherently fair means of bus arbitration by not allowing anyone output FIFO from continually monopolizing the communication data busthus, preventing the other output FIFO module from being continuallyblocked. In other words, commensurate loading of the FIFOs isaccomplished. If an input FIFO becomes full, a signal INW is sent toport arbitration unit 520 which causes I/O port 531 not to bring CRDY_low because at the start of the next transmission the first incomingeight-bits will overflow the input FIFO and data will be lost.

[0236] Another feature incorporated into the communication ports is theability effectuate input and output FIFO halting. Input and output FIFOhalting is the ability to prevent additional transfers from and to theoutput and input FIFOs respectively. During system development,debugging and use, the ability to stop an input and output FIFO withoutthe loss of any data that is being sent or received is a very desirablefeature. In the preferred embodiment, after a transfer of a word via thecommunication channel bus the port arbitration unit 520 returns to state00, by setting either the input channel halt (ICH=1) or the outputchannel halt (OCH=1) in the port control register 510, port logic unitin turn sends signal HOLDTOK to port arbitration unit 520. Portarbitration unit 520 has a couple of options after receipt of theHOLDTOK signal. It having possession of the token refuses to relinquishthe qualifying token thus preventing data from entering input FIFO 540via the communication channel bus or it refuses to arbitrate for thequalifying token, thus successfully stopping output, FIFO 550 fromtransmitting data via the communication channel bus.

[0237] For example, input FIFO 540 of communication port 50 (connectedto external port) has ICH=1. Then the input FIFO 540 is halted basedupon the communication channel's current state. The input channel isunhalted when ICH=0. When the input FIFO 540 of communication port 50 isunhalted (ICH=0) communication port 50 releases the qualifying token ifrequested.

[0238] Output FIFO halting is analogous to input FIFO halting. Forexample, output FIFO 550 of communication port 50 (connected to externalport) has OCH=1. Then the output FIFO 550 is halted based upon itscurrent state. If communication port 50 does not have the qualifyingtoken, output FIFO 550 is halted by communication port 50 not requestingthe qualifying token. If the communication port 50 has the qualifyingtoken and is currently transmitting a word, then after the transmissionis complete, no new transfers will be initiated.

[0239] Following the FIFO halting rules discussed above, other possiblescenarios of the preferred embodiment include: 1) communication port 50has the qualifying token, input FIFO 540 is not halted, and output FIFO550 is halted, then it will transfer the token when requested by theexternal port; 2) communication port 50 has the qualifying token, inputFIFO 540 is halted, and output FIFO 550 is halted, then it will nottransfer the token when requested by the external port; 3) coming out ofa halted state, if the communication port 50 has the token it maytransmit data if necessary, if it needs the token, it will arbitrate forthe token as described herein-above.

[0240]FIG. 15 further shows port logic unit 560 with interrupt signalsOCRDY (output channel ready), ICRDY (input channel ready), ICFULL (inputchannel full), and OCEMPTY (output channel empty) that are connected tointerrupt logic 250. Port logic unit 560 generates those interruptsbased upon signals on line input level and output level from input FIFO540 and output FIFO 550 respectively. But information (PINF) from portarbitration unit 520 and FIFO information from the FIFO registers arefed to port logic unit 560 which supplies port arbitration register 510input channel level, output channel level and port directioninformation.

[0241] The communication ports support three principle modes ofsynchronization: a ready/not ready signal that can halt CPU and DMAaccesses to a communication port; interrupts that can be used to signalthe CPU and DMA; status flags in the communication port control registerwhich can be polled by the CPU.

[0242] The most basic synchronization mechanism is based on a ready/notready signal. If the DMA or CPU attempt to read an empty input FIFO, anot ready signal is returned and the DMA or CPU will continue the readuntil a ready signal is received. The ready signal for the outputchannel is the OCRDY (output channel ready) which is also an interruptsignal. The ready signal for the input channel is ICRDY (input channelready) which is also an interrupt signal.

[0243] Interrupts are often a useful form of synchronization. Eachcommunication port generates four different interrupt signals: ICRDY(input channel ready), ICFULL (input channel full), OCRDY (outputchannel ready) and OCEMPTY (output channel empty). The CPU responds toany of these four interrupt signals. The DMA coprocessor responds to theICRDY and OCRDY interrupt signals.

[0244] The third mode of synchronization that can be employed in thepreferred embodiment is CPU polling. The CPU can be setup to poll thestatus flags in communication port control registers at predeterminedintervals or occurrences during the operation of the data processingdevice.

[0245] In addition to the communication ports 50-55, the preferredembodiment incorporates a special split mode DMA capability thattransforms one DMA channel into two DMA channels, one dedicated toreceiving data from a communication port and writing it to a location inthe memory map, and one dedicated to reading data from a location in thememory map and writing it to a communication port. All six DMA channelscan support any of the six communication ports.

[0246] In the present embodiment data words are thirty-two bits wide,however interface port 530 has a bus eight-bits wide; thus, interfaceport 530 adjusts for the disparity by having an I/O port 531, an inputand output data shifter 533, a multiplexer 536 and a thirty-two bitbuffer register 539. For example, to receive incoming data from theexternal port, a signal CSTRB_ precedes the data signaling communicationport 50 the presence of valid data on bus CD(7-0). Of course, externalport has possession of the qualifying token thus allowing it to transmitdata. The incoming data is received by I/O port 531 where data shifter533 shifts the received data via multiplexer 536 to the proper packetlocation within the thirty-two bit buffer register 539. After I/O port531 receives data from bus CD(7-0), it sends signal CRDY_ to confirm thereceipt of data from the external port. Since bus CD(7-0) is eight-bitswide, a data word is divided into four consecutive eight-bit packets tomake up the thirty-two bit word used in the preferred embodiment. Whenfour packets of eight-bits of data are placed in buffer register 539,port arbitration unit 520 sends signal SAVEFIF to FIFO control of inputFIFO 540, and the contents of the buffer register 539 is stored to inputFIFO 540, where the data is accessed via peripheral bus 28 as describedherein-above.

[0247] To transmit data to the external port, output FIFO 550 receivesthirty-two bit data words from peripheral bus 28 d. Port arbitrationunit 520 sends signal LOADBUF to FIFO control of output FIFO 550 and thecontents of output FIFO 550 is transferred to buffer register 539.Multiplexer 536 selects eight-bit packets that are shifted using datashifter 533 via I/O port 531 onto the eight-bit communication busCD(7-0). It should be noted that possession of the qualifying token byport arbitration unit 520 is implied to transmit data as describedabove. Communication port 50 signals valid data with CSTRB_ via I/O port531. Data is transferred via eight-bit bus CD(7-0). The external portreceiving the data from bus CD(7-0) signals the transmittingcommunication port 50 with CDRDY_ thereby acknowledging data is receivedcompleting a packet transfer. Three other packets are transferred tocomplete the thirty-two bit data word.

[0248]FIG. 18a illustrates the timing for a token transfer sequencebetween two communication ports, A and B. FIG. 18b continues the timingdiagram to illustrate a word transfer sequence followed by the start ofanother word transfer sequence. In order to accurately describe thetiming of the operation of the communication ports, it is important todifferentiate between the internal signals applied to the pins and theexternal status seen at the interface between the communication ports.Referring to FIG. 17, internal signals applied to a buffer with a suffix‘a’ depicts processor A and ‘b’ depicts processor B. The external signalbetween the two connected communication ports is denoted by aconcatenation of ‘a’ and ‘b.’ The value that a processor sees bysampling the output pad is denoted with a single right quote (’). Allsignals are buffered and can be placed in a high impedance state. ClocksH1 and H3 are generated within the clock generator circuit 200 and areused to synchronize communication port transfers.

[0249] The numbers shown on FIGS. 18a and 18 b correspond to the numbersin the following description. Each number describes the events occurringthat correspond to an instant represented by the corresponding number onthe timing diagrams shown in FIGS. 18a and 18 b. It should be noted thatnegative true signals are represented with a bar above the signal inFIGS. 18a and 18 b while an underscore after the signal is used in thefollowing description. Also the signal CST of FIGS. 18a and 18 b isequivalent to the signal CSTRB in the herein description.

[0250] Referring to FIG. 18a, a token request and token transfersequence proceeds as follows:

[0251] 1—B requests the token by bringing CREQb_(—) low.

[0252] 2—A sees the token request when CREQa’_ goes low

[0253] 3—A acknowledges the request, after a type 1 delay from CREQa’_falling, by bringing CACKa_ low.

[0254] 4—B sees the acknowledge from A when CACKb’_ goes low.

[0255] 5—A switches CRDYa_ from tristate to high on the first H1 risingafter CACKa_ falling.

[0256] 6—A tristates CDa(7-0) on the first H1 rising after CACKa_falling.

[0257] 7—B switches CSTRBb_ from tristate to high after a type 2 delayfrom CACKb’_ falling.

[0258] 8—B brings CREQb_ high after a type 1 delay from CACKb’_ falling.

[0259] 9—A sees CREQa’_ go high.

[0260]10—A brings CACKa_ high after CREQa’_ goes high.

[0261] 11—A tristates CSTRBa_ after CREQa_ goes high.

[0262] 12—A tristates CACKa_ after CREQa’_ goes high and after Ka_goeshigh.

[0263] 13—A switches CREQa_(—) from tristate to high after CREQa’_ goeshigh.

[0264] 14—B tristates CREQb_ after CREQb_ goes high.

[0265] 15—B switches CACKb_(—) from tristate to high after CREQb_goeshigh.

[0266]16—B tristates CRDYb_ on the H1 rising after CREQb_ goes high.

[0267]17—B drives the first byte onto CDb(7-0) on the H1 rising afterCREQb_ goes high.

[0268] 18—A sees the first byte on CDa’ (7-0).

[0269] 19—B brings CSTRBb_ low on the second H1 rising after CREQb_rising.

[0270] 20—A sees CSTRBa’_ go low, signalling valid data.

[0271] 21—A reads the data and brings CRDYa_ low.

[0272] 22—B sees CRDYb’_ go low, signalling data has been read.

[0273] 23—B drives the second byte on CDb(7-0) after CRDYb’_ goes low.

[0274]24—A sees the second byte on CDa’ (7-0).

[0275] 25—B brings CSTRBb_ high after CRDYb’_ goes low.

[0276] 26—A sees CSTRBa’_ go high.

[0277] 27—A brings CRDYa_ high after CSTRBa’_ goes high.

[0278] 28—B sees CRDYb’_ go high.

[0279] 29—B brings CSTRBb_ low after CRDYb’_ goes high.

[0280] 30—A sees CSTRBa’_ go low, signalling valid data.

[0281] 31—A reads the data and brings CRDYa_ low.

[0282] 32—B sees CRDYb’_ go low, signalling data has been read.

[0283] 33—B drives the third byte on CDb(7-0) after CRDYb’_ goes low.

[0284] 34—A sees the third byte on CDa(7-0).

[0285] 35—B brings CSTRBb_ high after CRDYb’_ goes low.

[0286] 36—A sees CSTRBa’_ go high.

[0287] The following events are used in FIG. 18b illustrating the timingfor a word transfer between communication ports A and B. It should benoted that the events described above also apply to the timing betweencommunication ports A and B shown in FIG. 18b.

[0288] 36—A sees CSTRBa’_ go high.

[0289] 37—A brings CRDYa_ high after CSTRBa’_ goes high.

[0290] 38—B sees CRDYb’_ go high.

[0291] 39—B brings CSTRBb_ low after CRDYb’_ goes high.

[0292] 40—A sees CSTRBa’_ go low, signalling valid data.

[0293] 41—A reads the data and brings CRDYa_(—) low.

[0294] 42—B sees CRDYb’_ go low, signalling data has been read.

[0295] 43—B drives the fourth byte on CDb(7-0) after CRDYb’_ goes low,

[0296] 44—A sees the fourth byte on CDa(7-0).

[0297] 45—B brings CSTRBb_ high after CRDYb’_ goes low.

[0298] 46—A sees CSTRBa’_ go high.

[0299] 47—A brings CRDYa_ high after CSTRBa’_ goes high.

[0300] 48—B sees CRDYb’_ go high.

[0301] 49—B brings CSTRBb_ low after CRDYb’_ goes high.

[0302] 50—A sees CSTRBa’_ go low, signalling valid data.

[0303] 51—A reads the data and brings CRDYa_ low.

[0304] 52—B sees CRDYb’_ go low, signalling data has been read.

[0305] 53—B brings CSTRBb_ high after CRDYb’_ goes low.

[0306] 54—A sees CSTRBa’_ go high.

[0307] 55—A brings CRDYa_ high after CSTRBa’_ goes high.

[0308] 56—B sees CRDYb’_ go high.

[0309] 57—B drives the first byte of the next word onto CDb(7-0) after atype 1 synchronizer delay from CRDYb’_ falling (52).

[0310] 58—A sees the first byte of the next word on CDa(7-0).

[0311] 59—B lowers CSTRBb_ after a type two delay from CRDYb’_ falling.

[0312]FIG. 19 shows an embodiment of a stand alone configuration of theimproved data processing configured to show connections to a pluralityof memories 350 and 351 and peripheral devices 360 and 361. Globalperipheral port 24 and local peripheral port 26 provide the interface tothe external devices. For example, bus 380 can be used for programaccesses and bus 390 can be used for data or I/O accesses which allowsfor simultaneous external program and data accesses. Microcomputer 10also has available six communication channels capable of interfacing toother systems in I/O intensive applications. Peripherals and otherexternal devices such as key boards, monitors, disk drives, printers,displays, transducers, modems, processors, local area networks (LANs),and other known or hereafter devised with which the system commends itsuse can be connected to the peripheral ports 24 and 26 and communicationports 50-55.

[0313] FIGS. 31-43 show embodiments of various parallel processingsystem architecture configurations which are possible with plurality ofimproved data processing device of this preferred embodiment withexternal memory.

[0314] For example, FIG. 20 specifically shows parallel processingsystem architecture with external memory in the form of building blockswhere memories 350 and 351 can be interfaced via bus 380 and bus 390 andcommunication ports for communication to additional data processingdevices of this preferred embodiment and comparable like communicationports. Alternatively as shown in FIG. 21, the parallel system buildingblock can be another microcomputer 10 effectuating communication viacommunication ports 50-55 and peripheral ports. The flexibility in themultitude of connections possible with microcomputer 10 offers a vastvariety of systems.

[0315] One possible system shown in FIG. 22 is a pipelined linear arrayusing three microcomputers 10 connected in a serial configuration.Another system is shown in FIG. 23 where a bi-directional ring utilizinga plurality of microcomputers 10 are connected with more than onecommunication port between two of the microcomputers 10 thus increasingthe communication bandwidth between those two microcomputers.

[0316] The parallel processing system architecture of FIG. 24 isarranged in the form of a tree. Again the communication ports are usedto connect between the trunks and branches and between parent andchildren and even more architectures are possible by variants of theillustration in FIG. 24.

[0317]FIG. 25 illustrates how communication ports support a variety oftwo dimensional structures where a two-dimensional mesh is constructedusing only four of the communication ports and nine microcomputers 10. Atwo-dimensional structure of hexagonal mesh and even higher dimensionalstructures are also supported as shown in FIG. 26.

[0318]FIG. 27 shows a three dimensional grid supported by sixcommunication ports. The microcomputer 10 in the center has all sixcommunication ports connected to six other microcomputers 10 each usingonly one communication port and having rest of the five communicationports in each unit available for further expansion of this threedimensional grid or extra memory or other like uses. Even higherdimensional structure in the form of a four dimensional hypercube isalso possible as shown in FIG. 28. Other higher dimensional structuresare also possible to the person of ordinary skill in the art.

[0319] A variation of the parallel processing system architectureconfiguration is illustrated in FIG. 29 where combinations of sharedmemories 350 and 351 and microcomputer-to-microcomputer communicationare possible. FIG. 30 illustrates a; parallel system where eachmicrocomputer 10 has local memory that can be shared between othermicrocomputers 10 via communication ports.

[0320] A system application having private local memories 340, 341, and342 and a shared global memory 350 is illustrated in FIG. 31. Globalmemory 350 is attached to external bus 380 while local memories 340,341, and 342 private to each microcomputer 10 are attached to auxiliarybus 390. Another variation is illustrated in FIG. 32 wheremicrocomputers 10 share global memories 350 and 351 via external bus 380and auxiliary bus 390.

[0321]FIG. 33 illustrates a parallel processing system where some remotemicrocomputers 10 are connected via modem link 450, 451, 452 and 453 totheir respective communication ports 50-55 while other localmicrocomputers 10 are connected directly via communication ports 50-55.Keyboard 460, display assembly 461 and mass data media 465 are connectedto local microcomputer 10 via communication ports.

[0322] The flexibility from the various communication port connectionsand memory sharing capabilities of microcomputers 10 provide systemsthat can be optimized for applications using a single microcomputer 10or multiple microcomputers 10. One possible system is in the field ofrobotics as shown in FIG. 34. Using microcomputer 10 as the buildingblock, the interactive interfacing required for the varies functions ofa robot 900 is accomplished. For example, robot 900 equipped with visionrecognition from sensor assembly 910 makes contact with an item out ofits reach. Signals 915 are sent to control logic 920 which supplysignals to control the operation of computation system 930 consisting ofplurality of parallel processing microcomputers 10. System 930 receivesprogram instructions from program memory 940. Data memory 950 providesdata storage for system 930. Command signals from system 930 aregenerated and transformed from digital to analog signals using D/A 955to control motors 960 for moving the various joints of robot 900. Analogsignals 958 provide the motor controls. While motors 960 are receivingcontrol signals, motors 960 are also providing feed back analog signals948 which are converted to digital signals via A/D converter 945. Thecomputation system 930 utilizing the feed back signals 948 from motors960 determines new motor control signals to be sent to motors 960insuring effective movement of robot 900. Additionally, as the robotmoves, vision recognition control relays distance and directioninformation back to control logic 920. Other functions of robot 900 suchas speech synthesis via speakers 912 and speech recognition from sensorassembly 910 also has a high degree of interactivness that system 900 iscapable to accommodate. As more and more functions and requirements ofthe system develop, additional microcomputers 10 can be readilyconnected to system 900.

[0323] Applications that utilize complex algorithms are well suited forthe herein-described preferred embodiments. Such applications includespeech-recognition technology, cellular radio phones, videoteleconferencing, and multiplexing four voice conversations on leased64-Kbit/s lines that formerly could carry only one. A large number ofother computationally-intensive problems are well-suited for parallelprocessing, such as 3D graphics, control, array processors, neuralnetworks, and numerous other applications listed in the coassignedapplications incorporated herein by reference.

[0324] Systems that have interactions with its components and othersystems benefit from the parallel processing system architectureconfiguration of microcomputer 10. Microcomputers 10 can be built uponto suit the needs of a system as system requirements grow. With the manycommunication ports, commands and interactive signals can be directed tothe proper microcomputer 10 or multiple of microcomputers 10 to respondto those commands and interactive signals.

[0325]FIG. 35 shows the circuit diagram for the multiplexing data forfour new three-operand instructions as well as other instructions. Thevarious modes include (109) 8-bit immediate (short immediate), integerimmediate (signed and unsigned), floating point immediate, direct,indirect, and long immediate. Short immediate and indirect (integer andfloating point) are used by the four new three-operand instructions. Themultiplexer for register mode is contained in the register file.

[0326]FIG. 36a illustrates the circuit diagram used to count the threeinstructions fetched after a delayed instruction, including delayed trap(LAT) and delayed Repeat Block (APTBO). The counter is reset by (DLYBR)whenever a delayed instruction is decoded. The counter counts every timethe Program Counter is updated. By keeping track of the program counterupdates, wait states are inserted due to pipeline conflicts. Pipelineconflicts occur when a task takes more than one system clock cycle tocomplete.

[0327]FIG. 36b illustrates a circuit with an incrementer used for thedelayed trap instruction. When the fetch of the third instruction aftera delayed trap begins, the program counter (PC) is located with the trapvector. PC+4 needs to be stored in PC+4 register 210 since the programneeds to return to PC+4. The PC is at PC+3 and the incrementer shown inFIG. 36b increments to PC+4 before being stored in stock memory.

[0328] Although the invention has been described in detail herein withreference to its preferred embodiment, it is to be understood that thisdescription is by way of example only, and is not to be construed in alimiting sense. It is to be further understood that numerous changes inthe details of the embodiments of the invention, and additionalembodiments of the invention, will be apparent to, and may be made by,persons of ordinary skill in the art having reference to thisdescription. It is contemplated that such changes and additionalembodiments are within the spirit and true scope of the invention asclaimed below.

What is claimed is:
 1. A data processing device comprising: a storagecircuit accessible by assertion of addresses; an arithmetic logic unit,connected to said storage circuit, operative to perform an arithmeticoperation on data received by said arithmetic unit; an address registerfor storing an initial address word indicative of a storage circuitaddress; an instruction decode and control unit, connected to saidstorage circuit and having an instruction register operative to hold aprogram instruction, said instruction decode and control unit operativeto decode the program instruction into control signals to control theoperations of the data processing device and location codes to controldata transfers according to predetermined sections of the programinstruction wherein at least one of the sections includes a locationsection selecting said address register and a displacement sectioncontaining address data; and an address generating unit, connected tosaid storage circuit, said instruction register, and said addressregister responsive to the control signals from said instruction decodeand control unit combining the initial address word from said addressregister and the address data from the displacement section to generatea storage circuit address.
 2. The data processing device of claim 1wherein the instruction register section includes a register sectiondecoded to select a data register containing data for said arithmeticlogic unit.
 3. The data processing device of claim 1 wherein theinstruction register includes an immediate section decoded to containimmediate data for said arithmetic logic unit.
 4. The data processingdevice of claim 3 wherein said arithmetic logic unit further comprises amultiplexer having a first input connected to said storage circuit and asecond input connected to the output of said instruction decode andcontrol unit.
 5. The data processing device of claim 1 wherein theinstruction register includes a register section decoded to select adata register for storing data resulting from an arithmetic operation bysaid arithmetic logic unit.
 6. The data processing device of claim 5wherein the instruction register includes sections decoded to select atleast three data locations.
 7. The data processing device of claim 5further comprising an address register file and a data register filewherein said instruction decode and control unit is operative to providecontrol signals for selecting the register files.
 8. The data processingdevice of claim 7 wherein said address generating unit includes a firstand a second auxiliary arithmetic logic unit each connected to saidstorage circuit and the address register file for concurrentlygenerating storage addresses.
 9. The data processing device of claim 7wherein the data register file, connected to said arithmetic logic unit,is operative to store data for said arithmetic logic unit.
 10. The dataprocessing device of claim 7 wherein said arithmetic logic unit furthercomprises a multiplexer having a first input connected to said storagecircuit and a second input connected to the data register.
 11. The dataprocessing device of claim 1 further comprising: an address busconnected to said address generating unit and said storage circuit; anda data bus connected to said arithmetic logic unit and said storagecircuit.
 12. A data processing device comprising: a memory having aplurality of addressable memory locations; an arithmetic logic unit,connected to said memory, operative to perform an arithmetic operationon operands received by said arithmetic unit; a data register, connectedto said arithmetic logic unit, operative to store a register operand forsaid arithmetic logic unit; and an instruction decode and control unit,connected to said memory and said arithmetic logic unit, having aninstruction register operative to hold a program instruction, saidinstruction decode and control unit operative to decode the programinstruction into control signals to control the operations of the dataprocessing device and to select at least two operands for saidarithmetic logic unit according to sections of the program instructionwherein the sections include a register section selecting said dataregister for said register operand and an immediate data sectioncontaining an immediate operand.
 13. The data processing device of claim12 wherein the instruction register includes sections decoded to selectat least three data locations.
 14. The data processing device of claim12 further comprising a data register file wherein said instructiondecode and control unit is operative to provide control signals forselecting the register file.
 15. The data processing device of claim 14wherein the data register file, connected to said arithmetic logic unit,is operative to store data for said arithmetic logic unit.
 16. The dataprocessing device of claim 12 further comprising a data bus connected tosaid arithmetic logic unit and said memory.
 17. The data processingdevice of claim 12 wherein said arithmetic logic unit further comprisesa multiplexer having a first input connected to said memory and a secondinput connected to the output of said instruction decode and controlunit.
 18. The data processing device of claim 14 wherein said arithmeticlogic unit further comprises a multiplexer having a first inputconnected to said memory and a second input connected to the dataregister.
 19. A data processing system comprising: a data processingdevice including: a storage circuit accessible by assertion ofaddresses; an arithmetic logic unit, connected to said memory, operativeto perform an arithmetic operation on data received by said arithmeticunit; an address register for storing an initial address word indicativeof a storage circuit address; an instruction recode and control unit,connected to said storage circuit having an instruction registeroperative to hold a program instruction, said instruction decode andcontrol unit operative to decode the program instruction providingcontrol signals to control the operations of the data processing deviceand to specify locations for data transfers according to fields of theprogram instruction wherein at least one of the fields include alocation section that specifies said address register and a displacementsection containing address data; and an address generating unit,connected to said storage circuit, said instruction register, and saidaddress register responsive to the control signals from said instructiondecode and control unit combining the initial address word from saidaddress register and the address data from the displacement section togenerate the new storage circuit address; and a circuit card havingexternal terminals, connected to the data processing system, operativeto exchange data signals between the data processing system and theexternal terminals.
 20. The data processing system of claim 19 furthercomprising a host microprocessor, connected to said data processingdevice, operative to provide control and data signals for controllingthe operation of the data processing system.
 21. The data processingsystem of claim 19 further comprising a memory having a plurality ofaddressable locations connected to said data processing device.
 22. Thedata processing system of claim 21 further comprising a memorycontroller connected to said memory and said data processing device,operative to provide control signals for controlling access to theaddressable locations.
 23. The data processing system of claim 19further comprising communication devices connected to said dataprocessing device operative to exchange data signals between the dataprocessing system and the external terminals.
 24. A data processingsystem comprising: a data processing device including: a memory having aplurality of addressable memory locations; an arithmetic logic unit,connected to said memory, operative to perform an arithmetic operationon operands received by said arithmetic unit; a data register, connectedto said arithmetic logic unit, operative to store a register operand forsaid arithmetic logic unit; and an instruction decode and control unit,connected to said memory and said arithmetic logic unit, having aninstruction register operative to hold a program instruction, saidinstruction decode and control unit operative to decode the programinstruction into control signals to control the operations of the dataprocessing device and to select at least two operands for saidarithmetic logic unit according to sections of the program instructionwherein the sections include a register section selecting said dataregister for said register operand and an immediate data sectioncontaining an immediate operand; a coprocessor, connected to said dataprocessing device, operative to provide control and data signals forcontrolling the operation of the data processing system; and aperipheral device, connected to said coprocessor, operative to providecommunication and data signals for exchanging data and status signalsbetween said data processing system and another data processing system.25. The data processing system of claim 24 further comprising anotherdata processing device connected to said first named data processing,said coprocessor, and said peripheral device operative to process datareceived from said connected devices and said coprocessor.
 26. The dataprocessing system of claim 24 further comprising a circuit board havingexternal terminal connected to said data processing device, saidcoprocessor and said peripheral device.
 27. A method of operating a dataprocessing device comprising the steps of: storing data in a memoryaccessible by assertion of addresses; performing an arithmetic operationon data received by an arithmetic logic unit; decoding a programinstruction into control signals to control the operations of the dataprocessing device and location codes to control data transfers accordingto predetermined sections of the program instruction wherein at leastone of the section includes a location section selecting said addressregister and a displacement section containing address data; andgenerating addresses responsive to the control signals from saidinstruction decode and control unit by combining an address word fromsaid address register and the address data from the displacementsection.
 28. The method of claim 27 further comprising the step ofdecoding the program instruction to select a data register containingdata for said arithmetic logic unit.
 29. The method of claim 27 furthercomprising the step of decoding the program instruction to selectimmediate data contained in a section of the program instruction. 30.The method of claim 27 further comprising the step of multiplexing afirst input from the memory and a second input from an output of thedecoded program instruction to an input of said arithmetic logic unit.31. A method of operating a data processing device comprising the stepsof: storing data in a memory accessible by assertion of addresses;performing an arithmetic operation on data received by an arithmeticlogic unit; storing data in a data register so that said arithmeticlogic unit operates on the data stored in said data register; anddecoding a program instruction into control signals to control theoperations of the data processing device and to select at least twooperands for said arithmetic logic unit according to sections of theprogram instruction wherein the sections include a register sectionselecting said data register for said register operand and an immediatedata section containing an immediate operand.
 32. The method of claim 31further comprising the step of multiplexing a first input from saidmemory and a second input from an output of the decoded programinstruction to an input of said arithmetic logic unit.
 33. The dataprocessing device of claim 31 further comprising the step ofmultiplexing a first input from said memory and a second input from thedata register to an input of said arithmetic logic unit.